Claude Elwood Shannon
Miscellaneous Writings
Edited by
N. J. A. Sloane
Aaron D. Wyner
Back in 1993, the late Aaron Wyner and I edited Claude Elwood Shannon's
papers, and most of them appeared in a volume (Claude Elwood
Shannon's Collected Papers) which was published by the IEEE Press.
However, there were a number of items written by Shannon of lesser
interest which we did not include (some declassified wartime memoranda,
obscure AT&T Bell Labs memos, some mimeographed MIT lecture notes, etc.).
These we put into a binder, held together by an Acco metal strip.
We made half a dozen copies, and gave copies to the Library
of Congress, the British Library, the Bell Laboratories Library,
the MIT Library, to Claude Shannon himself, and to one or two other places.
Over the years many people have asked me if it was possible to get access
to this collection.
I had now had this volume scanned and converted to pdf files.
The total size of the files is about 450 megabytes.
Neil J. A. Sloane, October 13, 2013
Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill,
New Jersey 07974
CONTENTS
File 1 : Front matter
This volume contains the following items. Bracketed numbers refer to the bibliography.
"The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum
MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs.
"A Study of the Deflection Mechanism and Some Results on Rate Finders,"
Report to National Defense Research Committee, Div. 7-311 -Ml, circa April,
1941,37 pp. + 15 figs.
"A Height Data Smoothing Mechanism," Report to National Defense Research
Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs.
"Some Experimental Results on the Deflection Mechanism," Report to National
Defense Research Committee, Div. 7-31 1-M1, June 26, 1941, 11 pp.
"Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8,
1941,5 pp. + 3 figs.
(With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen
Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense
Research Committee, July 15, 1943, 9 pp.
"Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944,
Bell Laboratories, 2 pp. + 3 Figs.
(Note that many of these files contain more than one document.)
File
5:
[5]
File
7:
[7]
File
9:
[9]
File
11:
[11]
File
12:
[12]
File
16:
[16]
File
16:
[19]
File
16:
File
21:
File
21:
File
24:
File
26:
File
27:
File
30:
File
31:
File
31:
File
31:
File
31:
File
36:
File
36:
File
46:
File
46:
File
46:
[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell
Laboratories, 1 p. + 1 fig.
[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver,"
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs.
[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript,
August 4, 1944, Bell Laboratories, 4 pp.
[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept.
1, 1945, Bell Laboratories, 1 14 pp. + 25 figs.
[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell
Laboratories, 17 pp.
[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in
Fire-Control Systems," Summary Technical Report, Div. 7, National Defense
Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159
and 166-167. AD 200795. Also in National Military Establishment Research and
Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by
[51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory
and Practice, Addison-Wesley, Reading, Mass., 1965.
[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four-
Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM
46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs.
[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946,
Bell Laboratories, 5 pp. + 1 fig.
[32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5
pp. + 1 fig.
[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5
PP-
[35] "Systems Which Approach the Ideal as P/N — > «>," Typescript, March 15,
1948, 2 pp.
[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp.
[45] "Significance and Application [of Communication Research]," Symposium on
Communication Research, 11-13 October, 1948, Research and Development
Board, Department of Defense, Washington, DC, pp. 14-23, 1948.
[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell
Laboratories, 1 p.
[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18,
1948, Bell Laboratories, 2 pp.
[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell
Laboratories, 2 pp. + 2 Figs.
Pi
n
Fi
■
le
46:
Fi
le
59:
Fi
le
59:
Fi
le
59:
Fi
le
59:
Fi
le
78:
Fi
le
78:
Fi
le
78:
Fi
le
78:
Fi
le
78:
File 104
[49] "Information Theory," Typescript of abstract of talk for American Statistical
Society, 1949, 5 pp.
[58] "Proof of an Integration Formula,'* Typescript, circa 1950, Bell Laboratories, 2
pp.
[59] "A Digital Method of Transmitting Information," Typescript, no date, circa
1950, Bell Laboratories, 3 pp.
[72] * 'Creative Thinking,' ' Typescript, March 20, 1952, Bell Laboratories, 10 pp.
[74] (With E. F. Moore) "The Relay Circuit Analyzer,*' Memorandum MM 53-1400-
9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs.
[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7
pp.
[78] ' 'Tower of Hanoi,' ' Typescript, April 20, 1953, Bell Laboratories, 4 pp.
[81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving
the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp.
[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53-
140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs.
[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited
Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig.
[95] "Concavity of Transmission Rate as a Function of Input Probabilities,"
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories.
[104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology,
1956 and succeeding years. Contains the following sections:
"A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of
martingales and related questions," 19 pp. "Some useful inequalities for
distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9
pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp.
"Upper and lower bounds for powers of a matrix with non-negative elements," 3
pp. "The number of sequences of a given length," 3 pp. "Characteristic for a
language with independent letters/' 4 pp. "The probability of error in optimal
codes," 5 pp. "Zero error codes and the zero error capacity Co," 10 pp.
"Lower bound for Pef for a completely connected channel with feedback," 1 p.
"A lower bound for P€ when R > C," 2 pp. "A lower bound for Pe," 2 pp.
"Lower bound with one type of input and many types of output," 3 pp.
"Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for
the memory less feedback channel," 1 p. "Continuity of Pe opt as a function of
transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of
Pe to p," 2 pp. "Bound on Pe for random ode by simple threshold argument," 4
pp. "A bound on P€ for a random code," 3 pp. "The Feinstein bound," 2 pp.
"Relations between probability and minimum word separation," 4 pp.
File 104
File
105
[105]
File
105
[106]
File
105
■ [107]
File
105
[108]
File
105
[124]
File
105
; [127]
"Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a
function of transition probabilities," 1 pp. "A geometric interpretation of
channel capacity," 6 pp, "Log moment generating function for the square of a
Gaussian variate," 2 pp. "Upper bound on Pe for Gaussian channel by
expurgated random code," 2 pp. "Lower bound on Pe in Gaussian channel by
minimum distance argument," 2 pp, "The sphere packing bound for the
Gaussian power limited channel," 4 pp. "The r-terminal channel," 7 pp.
"Conditions for constant mutual information," 2 pp, "The central limit theorem
with large deviations," 6 pp. "The Chemoff inequality," 2 pp. "Upper and
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the
distribution function," 5 pp. "Generalized Chebyeheff and Chernoff
inequalities," I p. "Channels with side information at the transmitter," 13 pp.
"Some miscellaneous results in coding theory," 15 pp. "Error probability
bounds for noisy channels," 20 pp.
"Reliable Machines from Unreliable Components," notes of five lectures,
Massachusetts Institute of Technology, Spring 1956, 24 pp.
"The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by
W. W, Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp.
"Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a
lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp.
"Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture,
Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp.
"The Fourth- Dimensional Twist, or a Modest Proposal in Aid of the American
Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7
pp. + 8 figs.
"A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp.
Claude Elwood Shannon
Miscellaneous Writings
Edited by
N. J. A. Sloane
Aaron D. Wyner
Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill,
New Jersey 07974
Preface
This volume contains all of Claude Elwood Shannon's writings that we did not include in
his Collected Papers. *
* Claude Elwood Shannon: Collected Papers, edited by N. J. A. Sloane and A. D. Wyner, IEEE Press,
New York, 1993, xliv + 924 pp. ISBN 0-7803-0434-9.
Contents
Photograph of Claude Shannon at Bell Labs in May 1952. Caption: "In 1952, Claude E.
Shannon of Bell Laboratories devised an experiment to illustrate the capabilities of
telephone relays. Here, an electrical mouse finds its way unerringly through a maze,
guided by information remembered in the kind of switching relays used in dial telephone
systems. Experiments with the mouse helped stimulate Bell Laboratories researchers to
think of new ways to use the logical powers of computers for operations other than
numerical calculation."
Photograph of Claude Shannon and Dave Hagelbarger at Bell Labs in March 1955.
Caption: "Claude Shannon, the originator of Information Theory, at the board and Dave
Hagelbarger work out some equations needed. Their current projects include work on
automata-advanced type of computing machines which are able to perform various
thought functions.
Photograph of Claude Shannon taken in 1980's. Photographer unknown.
Preface
Bibliography of Claude Elwood Shannon. Comments such as "Included in Part B" refer
to Parts A, B, C, D of the Collected Papers mentioned in the Preface.
This volume contains the following items. Bracketed numbers refer to the bibliography.
[5] 4 The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum
MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs.
[7] "A Study of the Deflection Mechanism and Some Results on Rate Finders,"
Report to National Defense Research Committee, Div. 7-31 1-M1, circa April,
1941,37 pp. + 15 figs.
[9] "A Height Data Smoothing Mechanism," Report to National Defense Research
Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs.
[11] "Some Experimental Results on the Deflection Mechanism," Report to National
Defense Research Committee, Div. 7-31 1 -Ml, June 26, 1941, 1 1 pp.
[12] "Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8,
1941,5 pp. + 3 figs.
[16] (With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen
Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense
Research Committee, July 15, 1943, 9 pp.
[19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944,
Bell Laboratories, 2 pp. + 3 Figs.
-2-
[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell
Laboratories, 1 p. + 1 fig.
[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver,"
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs.
[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript,
August 4, 1944, Bell Laboratories, 4 pp.
[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept.
1, 1945, Bell Laboratories, 1 14 pp. + 25 figs.
[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell
Laboratories, 17 pp.
[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in
Fire-Control Systems," Summary Technical Report, Div. 7, National Defense
Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159
and 166-167. AD 200795. Also in National Military Establishment Research and
Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by
[51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory
and Practice, Addison- Wesley, Reading, Mass., 1965.
[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four-
Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM
46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs.
[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946,
Bell Laboratories, 5 pp. + 1 fig.
[32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5
pp. + 1 fig.
[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5
pp.
[35] "Systems Which Approach the Ideal as P/N -> <»," Typescript, March 15,
1948, 2 pp.
[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp.
[45] "Significance and Application [of Communication Research]," Symposium on
Communication Research, 11-13 October, 1948, Research and Development
Board, Department of Defense, Washington, DC, pp. 14-23, 1948.
[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell
Laboratories, 1 p.
[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18,
1948, Bell Laboratories, 2 pp.
[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell
Laboratories, 2 pp. + 2 Figs.
-3-
[49] "Information Theory," Typescript of abstract of talk for American Statistical
Society, 1949, 5 pp.
[58] "Proof of an Integration Formula," Typescript, circa 1950, Bell Laboratories, 2
pp.
[59] "A Digital Method of Transmitting Information," Typescript, no date, circa
1950, Bell Laboratories, 3 pp.
[72] ' 'Creative Thinking," Typescript, March 20, 1952, Bell Laboratories, 10 pp.
[74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM 53-1400-
9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs.
[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7
pp.
[78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp.
[81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving
the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp.
[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53-
140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs.
[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited
Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig.
[95] "Concavity of Transmission Rate as a Function of Input Probabilities,"
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories.
[104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology,
1956 and succeeding years. Contains the following sections:
"A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of
martingales and related questions," 19 pp. "Some useful inequalities for
distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9
pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp.
"Upper and lower bounds for powers of a matrix with non-negative elements," 3
pp. "The number of sequences of a given length," 3 pp. "Characteristic for a
language with independent letters," 4 pp. "The probability of error in optimal
codes," 5 pp. "Zero error codes and the zero error capacity C0," 10 pp.
"Lower bound for Pej for a completely connected channel with feedback," 1 p.
"A lower bound for Pe when R > C," 2 pp. "A lower bound for Pe" 2 pp.
"Lower bound with one type of input and many types of output," 3 pp.
"Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for
the memoryless feedback channel," 1 p. "Continuity of Pe opt as a function of
transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of
Pe to p," 2 pp. "Bound on Pe for random ode by simple threshold argument," 4
pp. "A bound on Pe for a random code," 3 pp. "The Feinstein bound," 2 pp.
"Relations between probability and minimum word separation," 4 pp.
-4-
"Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a
function of transition probabilities," 1 pp. "A geometric interpretation of
channel capacity," 6 pp. "Log moment generating function for the square of a
Gaussian variate," 2 pp. "Upper bound on Pe for Gaussian channel by
expurgated random code," 2 pp. "Lower bound on Pe in Gaussian channel by
minimum distance argument," 2 pp. "The sphere packing bound for the
Gaussian power limited channel," 4 pp. "The jT-terminal channel," 7 pp.
"Conditions for constant mutual information," 2 pp. "The central limit theorem
with large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the
distribution function," 5 pp. "Generalized Chebycheff and Chernoff
inequalities," 1 p. "Channels with side information at the transmitter," 13 pp.
"Some miscellaneous results in coding theory," 15 pp. "Error probability
bounds for noisy channels," 20 pp.
[105] "Reliable Machines from Unreliable Components," notes of five lectures,
Massachusetts Institute of Technology, Spring 1956, 24 pp.
[106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by
W. W. Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp.
[107] "Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a
lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp.
[108] "Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture,
Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp.
[124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the American
Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7
pp. + 8 figs.
[127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp.
Bibliography of Claude Elwood Shannon
"A Symbolic Analysis of Relay and Switching Circuits," Transactions
American Institute of Electrical Engineers, Vol. 57 (1938), pp. 713-723.
(Received March 1, 1938.) Included in Part B.
Letter to Vannevar Bush, Feb. 16, 1939. Printed in F.-W. Hagemeyer,
Die Entstehung von Informationskonzepten in der Nachrichtentechnik:
eine Fallstudie zur Theoriebildung in der Technik in Industrie- und
Kriegsforschung [The Origin of Information Theory Concepts in
Communication Technology: Case Study for Engineering Theory-
Building in Industrial and Military Research], Doctoral Dissertation,
Free Univ. Berlin, Nov. 8, 1979, 570 pp. Included in Part A.
"An Algebra for Theoretical Genetics," Ph.D. Dissertation, Department
of Mathematics, Massachusetts Institute of Technology, April 15, 1940,
69 pp. Included in Part C.
"A Theorem on Color Coding," Memorandum 40-130-153, July 8,
1940, Bell Laboratories. Superseded by "A Theorem on Coloring the
Lines of a Network. ' ' Not included.
"The Use of the Lakatos-Hickman Relay in a Subscriber Sender,"
Memorandum MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp.
"A Study of the Deflection Mechanism and Some Results on Rate
Finders," Report to National Defense Research Committee, Div. 7-311-
Ml, circa April, 1941, 37 pp. + 15 figs. Included in this volume.
"Backlash in Overdamped Systems," Report to National Defense
Research Committee, Princeton Univ., May 14, 1941, 6 pp. Abstract
only included in Part B.
"A Height Data Smoothing Mechanism," Report to National Defense
Research Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941,
9 pp. + 9 figs. Included in this volume.
"The Theory of Linear Differential and Smoothing Operators," Report
to National Defense Research Committee, Div. 7-3 13.1 -Ml, Princeton
Univ., June 8, 1941, 1 1 pp. Not included.
"Some Experimental Results on the Deflection Mechanism," Report to
National Defense Research Committee, Div. 7-3 11 -Ml, June 26, 1941,
1 1 pp. Included in this volume.
B.
[12] "Criteria for Consistency and Uniqueness in Relay Circuits,"
Typescript, Sept. 8, 1941, 5 pp. + 3 figs. Included in this volume.
[13] "The Theory and Design of Linear Differential Equation Machines,"
Report to the Services 20, Div. 7-31 1-M2, Jan. 1942, Bell Laboratories,
73 pp. + 30 figs. Included in Part B.
[14] (With John Riordan) "The Number of Two-Terminal Series-Parallel
Networks," Journal of Mathematics and Physics, Vol. 21 (August,
1942), pp. 83-93. Included in Part B.
[15] "Analogue of the Vernam System for Continuous Time Series,"
Memorandum MM 43-110-44, May 10, 1943, Bell Laboratories, 4 pp. +
4 figs. Included in Part A.
[16] (With W. Feller) "On the Integration of the Ballistic Equations on the
Aberdeen Analyzer," Applied Mathematics Panel Report No. 28.1,
National Defense Research Committee, July 15, 1943, 9 pp. Included in
this volume.
[17] "Pulse Code Modulation," Memorandum MM 43-110-43, December 1,
1943, Bell Laboratories. Not included.
[18] "Feedback Systems with Periodic Loop Closure," Memorandum MM
44-1 10-32, March 16, 1944, Bell Laboratories. Not included.
[19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29,
1944, Bell Laboratories, 2 pp. + 3 Figs. Included in this volume.
[20] "Counting Up or Down With Pulse Counters," Typescript, May 31,
1944, Bell Laboratories, 1 p. + 1 fig. Included in this volume.
[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver,"
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1
figs. Included in this volume.
[22] "The Best Detection of Pulses," Memorandum MM 44-1 10-28, June 22,
1944, Bell Laboratories, 3 pp. Included in Part A.
[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses,"
Typescript, August 4, 1944, Bell Laboratories, 4 pp. Included in this
volume.
[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-
110-02, Sept. 1, 1945, Bell Laboratories, 114 pp. + 25 figs. Superseded
by the following paper. Included in this volume.
[25] "Communication Theory of Secrecy Systems," Bell System Technical
Journal, Vol. 28 (1949), pp. 656-715. "The material in this paper
appeared originally in a confidential report 'A Mathematical Theory of
Cryptography', dated Sept. 1, 1945, which has now been declassified."
Included in Part A.
-3-
[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945,
Bell Laboratories, 17 pp. Included in this volume.
[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and
Prediction in Fire-Control Systems," Summary Technical Report,
Div. 7, National Defense Research Committee, Vol. 1 , Gunfire Control,
Washington, DC, 1946, pp. 71-159 and 166-167. AD 200795. Also in
National Military Establishment Research and Development Board,
Report #13 MGC 12/1, August 15, 1948. Superseded by [51] and by R.
B. Blackman, Linear Data-Smoothing and Prediction in Theory and
Practice, Addison-Wesley, Reading, Mass., 1965. Included in this
volume.
[28] (With B. M. Oliver) "Communication System Employing Pulse Code
Modulation," Patent 2,801,281. Filed Feb. 21, 1946, granted July 30,
1957. Not included.
[29] (With B. D. Holbrook) "A Sender Circuit For Panel or Crossbar
Telephone Systems," Patent application circa 1946, application dropped
April 13, 1948. Not included.
[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of
Four-Terminal Unilateral Linear Networks Connected in Tandem,"
Memorandum MM 46-110-49, April 10, 1946, Bell Laboratories, 34 pp.
+ 16 figs. Included in this volume.
[31] "Electronic Methods in Telephone Switching," Typescript, October 17,
1946, Bell Laboratories, 5 pp. + 1 fig. Included in this volume.
[32] "Some Generalizations of the Sampling Theorem," Typescript, March
4, 1948, 5 pp. + 1 fig. Included in this volume.
[33] (With J. R. Pierce and J. W. Tukey) "Cathode-Ray Device," Patent
2,576,040. Filed March 10, 1948, granted Nov. 20, 1951. Not included.
[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15,
1948, 5 pp. Included in this volume.
[35] "Systems Which Approach the Ideal as P/N -> oo," Typescript, March
15, 1948, 2 pp. Included in this volume.
[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp.
Included in this volume.
[37] "A Mathematical Theory of Communication," Bell System Technical
Journal, Vol. 27 (July and October 1948), pp. 379-423 and 623-656.
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[38] (With Warren Weaver) The Mathematical Theory of Communication,
University of Illinois Press, Urbana, JL, 1949, vi + 1 17 pp. Reprinted
(and repaginated) 1963. The section by Shannon is essentially identical
to the previous item. Not included.
[39] (With Warren Weaver) Mathematische Grundlagen der
Informationstheorie, Scientia Nova, Oldenbourg Verlag, Munich, 1976,
pp. 143. German translation of the preceding book. Not included.
[40] (With B. M. Oliver and J. R. Pierce) "The Philosophy of PCM,"
Proceedings Institute of Radio Engineers, Vol. 36 (1948), pp. 1324-
1331. (Received May 24, 1948.) Included in Part A.
[41] "Samples of Statistical English," Typescript, June 11, 1948, Bell
Laboratories, 3 pp. Included in this volume.
[42] "Network Rings," Typescript, June 11, 1948, Bell Laboratories, 26 pp.
+ 4 figs. Included in Part B.
[43] "Communication in the Presence of Noise," Proceedings Institute of
Radio Engineers, Vol. 37 (1949), pp. 10-21. (Received July 23, 1940
[1948?].) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Reprinted
in Proceedings Institute of Electrical and Electronic Engineers, Vol. 72
(1984), pp. 1192-1201. Included in Part A.
[44] "A Theorem on Coloring the Lines of a Network," Journal of
Mathematics and Physics, Vol. 28 (1949), pp. 148-151. (Received Sept.
14, 1948.) Included in Part B.
[45] "Significance and Application [of Communication Research],"
Symposium on Communication Research, 11-13 October, 1948, Research
and Development Board, Department of Defense, Washington, DC, pp.
14-23, 1948. Included in this volume.
[46] "Note on Certain Transcendental Numbers," Typescript, October 27,
1948, Bell Laboratories, 1 p. Included in this volume.
[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript,
Nov. 18, 1948, Bell Laboratories, 2 pp. Included in this volume.
[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6
1948, Bell Laboratories, 2 pp. + 2 Figs. Included in this volume.
[49] "Information Theory," Typescript of abstract of talk for American
Statistical Society, 1949, 5 pp. Included in this volume.
[50] "The Synthesis of Two-Terminal Switching Circuits," Bell System
Technical Journal, Vol. 28 (Jan., 1949), pp. 59-98. Included in Part B.
[51] (With H. W. Bode) "A Simplified Derivation of Linear Least Squares
Smoothing and Prediction Theory," Proceedings Institute of Radio
Engineers, Vol. 38 (1950), pp. 417-425. (Received July 13, 1949.)
Included in Part B.
-5-
[52] "Review of Transformations on Lattices and Structures of Logic by
Stephen A. Kiss," Proceedings Institute of Radio Engineers, Vol. 37
(1949), p. 1 163. Included in Part B.
[53] "Review of Cybernetics, or Control and Communication in the Animal
and the Machine by Norbert Wiener," Proceedings Institute of Radio
Engineers, Vol. 37 (1949), p. 1305. Included in Part B.
[54] "Programming a Computer for Playing Chess," Philosophical
Magazine, Series 7, Vol. 41 (No. 314, March 1950), pp. 256-275.
(Received Nov. 8, 1949.) Reprinted in D. N. L. Levy, editor, Computer
Chess Compendium, Springer- Verlag, NY, 1988. Included in Part B.
[55] "A Chess-Playing Machine," Scientific American, Vol. 182 (No. 2,
February 1950), pp. 48-51. Reprinted in The World of Mathematics,
edited by James R. Newman, Simon and Schuster, NY, Vol. 4, 1956, pp.
2124-2133. Included in Part B.
[56] "Memory Requirements in a Telephone Exchange," Bell System
Technical Journal, Vol. 29 (1950), pp. 343-349. (Received Dec. 7,
1949. ) Included in Part B.
[57] "A Symmetrical Notation for Numbers," American Mathematical
Monthly, Vol. 57 (Feb., 1950), pp. 90-93. Included in Part B.
[58] "Proof of an Integration Formula," Typescript, circa 1950, Bell
Laboratories, 2 pp. Included in this volume.
[59] "A Digital Method of Transmitting Information," Typescript, no date,
circa 1950, Bell Laboratories, 3 pp. Included in this volume.
[60] "Communication Theory — Exposition of Fundamentals," in "Report
of Proceedings, Symposium on Information Theory, London, Sept.,
1950, " Institute of Radio Engineers, Transactions on Information
Theory, No. 1 (February, 1953), pp. 44-47. Included in Part A.
[61] "General Treatment of the Problem of Coding," in "Report of
Proceedings, Symposium on Information Theory, London, Sept., 1950,"
Institute of Radio Engineers, Transactions on Information Theory, No. 1
(February, 1953), pp. 102-104. Included in Part A.
[62] "The Lattice Theory of Information," in "Report of Proceedings,
Symposium on Information Theory, London, Sept., 1950," Institute of
Radio Engineers, Transactions on Information Theory, No. 1 (February,
1953), pp. 105-107. Included in Part A.
[63] (With E. C. Cherry, S. H. Moss, Dr. Uttley, I. J. Good, W. Lawrence and
W. P. Anderson) "Discussion of Preceding Three Papers," in "Report
of Proceedings, Symposium on Information Theory, London, Sept.,
1950," Institute of Radio Engineers, Transactions on Information
Theory, No. 1 (February, 1953), pp. 169-174. Included in Part A.
[64] "Review of Description of a Relay Computer, by the Staff of the
[Harvard] Computation Laboratory," Proceedings Institute of Radio
Engineers, Vol. 38 (1950), p. 449. Included in Part B.
[65] "Recent Developments in Communication Theory," Electronics, Vol.
23 (April, 1950), pp. 80-83. Included in Part A.
[66] German translation of [65], in Tech. Mitt. P.T.T., Bern, Vol. 28 (1950),
pp. 337-342. Not included.
[67] "A Method of Power or Signal Transmission To a Moving Vehicle,"
Memorandum for Record, July 19, 1950, Bell Laboratories, 2 pp. + 4
figs. Included in Part B.
[68] "Some Topics in Information Theory," in Proceedings International
Congress of Mathematicians (Cambridge, Mass., Aug. 30 - Sept. 6, 1950)
, American Mathematical Society, Vol. II (1952), pp. 262-263. Included
in Part A.
[69] "Prediction and Entropy of Printed English," Bell System Technical
Journal, Vol. 30 (1951), pp. 50-64. (Received Sept. 15, 1950.)
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[70] "Presentation of a Maze Solving Machine," in Cybernetics: Circular,
Causal and Feedback Mechanisms in Biological and Social Systems,
Transactions Eighth Conference, March 15-16, 1951, New York, N. K,
edited by H. von Foerster, M. Mead and H. L. Teuber, Josiah Macy Jr.
Foundation, New York, 1952, pp. 169-181. Included in Part B.
[71] "Control Apparatus," Patent application Aug. 1951, dropped Jan. 21,
1954. Not included.
pp. Included in this volume.
[73] "A Mind-Reading (?) Machine," Typescript, March 18, 1953, Bell
Laboratories, 4 pp. Included in Part B.
[74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM
53-1400-9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. Included
in this volume.
[75] "The Potentialities of Computers," Typescript, April 3, 1953, Bell
Laboratories. Included in Part B.
[76] "Throbac I," Typescript, April 9, 1953, Bell Laboratories, 5 pp.
Included in Part B.
[72] "Creative Thinking,"
20, 1952, Bell Laboratories, 10
[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell
Laboratories, 7 pp. Included in this volume.
-7-
[78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp.
Included in this volume.
[79] (With E. F. Moore) "Electrical Circuit Analyzer," Patent 2,776,405.
Filed May 18, 1953, granted Jan. 1, 1957. Not included.
[80] (With E. F. Moore) "Machine Aid for Switching Circuit Design,"
Proceedings Institute of Radio Engineers, Vol. 41 (1953), pp. 1348-
1351. (Received May 28, 1953.) Included in Part B.
[81] "Mathmanship or How to Give an Explicit Solution Without Actually
Solving the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp.
Included in this volume.
[82] "Computers and Automata," Proceedings Institute of Radio Engineers,
Vol.41 (1953), pp. 1234-1241. (Received July 17, 1953.) Reprinted in
Methodos, Vol. 6 (1954), pp. 1 15-130. Included in Part B.
[83] "Realization of All 16 Switching Functions of Two Variables Requires
18 Contacts," Memorandum MM 53-1400-40, November 17, 1953, Bell
Laboratories, 4 pp. + 2 figs. Included in Part B.
[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum
MM 53-140-52, November 30, 1953, Bell Laboratories, 26 pp. + 5 figs.
Included in this volume.
[85] (With D. W. Hagelbarger) "A Relay Laboratory Outfit for Colleges,"
Memorandum MM 54-114-17, January 10, 1954, Bell Laboratories.
Included in Part B.
[86] "Efficient Coding of a Binary Source With One Very Infrequent
Symbol," Memorandum MM 54-114-7, January 29, 1954, Bell
Laboratories. Included in Part A.
[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude
Limited Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1
Fig. Included in this volume.
[88] (With Edward F. Moore) "Reliable Circuits Using Crummy Relays,"
Memorandum 54-114-42, Nov. 29, 1954, Bell Laboratories. Published
as the following two items.
[89] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays
I," Journal Franklin Institute, Vol. 262 (Sept., 1956), pp. 191-208.
Included in Part B.
[90] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays
n," Journal Franklin Institute, Vol. 262 (Oct., 1956), pp. 281-297.
Included in Part B.
[91] (Edited jointly with John McCarthy) Automata Studies, Annals of
Mathematics Studies Number 34, Princeton University Press, Princeton,
-8-
NJ, 1956, ix + 285 pp. The Preface, Table of Contents, and the two
papers by Shannon are included in Part B.
[92] (With John McCarthy), Studien zur Theorie der Automaten, Munich,
1974. (German translation of the preceding work.)
[93] ' 'A Universal Turing Machine With Two Internal States," Memorandum
54-114-38, May 15, 1954, Bell Laboratories. Published in Automata
Studies, pp. 157-165. Included in Part B.
[94] (With Karel de Leeuw, Edward F. Moore and N. Shapiro)
"Computability by Probabilistic Machines," Memorandum 54-114-37,
Oct. 21, 1954, Bell Laboratories. Published in [87], pp. 183-212.
Included in Part B.
[95] "Concavity of Transmission Rate as a Function of Input Probabilities,"
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. Included
in this volume.
[96] "Some Results on Ideal Rectifier Circuits," Memorandum MM 55-1 14-
29, June 8, 1955, Bell Laboratories. Included in Part B.
[97] "The Simultaneous Synthesis of s Switching Functions of n Variables,"
Memorandum MM 55-1 14-30, June 8, 1955, Bell Laboratories. Included
in Part B.
[98] (With D. W. Hagelbarger) "Concavity of Resistance Functions,"
Journal Applied Physics, Vol. 27 (1956), pp. 42-43. (Received August 1,
1955.) Included in Part B.
[99] ' 'Game Playing Machines," Journal Franklin Institute, Vol. 260 ( 1 955),
pp. 447-453. (Delivered Oct. 19, 1955.) Included in Part B.
[100] "Information Theory," Encyclopedia Britannica, Chicago, IL, 14th
Edition, 1968 printing, Vol. 12, pp. 246B-249. (Written circa 1955.)
Included in Part A.
[101] "Cybernetics," Encyclopedia Britannica, Chicago, IL, 14th Edition,
1968 printing, Vol. 12. (Written circa 1955.) Not included.
[102] "The Rate of Approach to Ideal Coding (Abstract)," Proceedings
Institute of Radio Engineers, Vol. 43 (1955), p. 356. Included in Part A.
[103] "The Bandwagon (Editorial)," Institute of Radio Engineers,
Transactions on Information Theory, Vol. IT-2 (March, 1956), p. 3.
Included in Part A.
[104] "Information Theory," Seminar Notes, Massachusetts Institute of
Technology, 1956 and succeeding years. Included in this volume.
Contains the following sections:
"A skeleton key to the information theory notes," 3 pp. "Bounds on the
-9-
tails of martingales and related questions," 19 pp. "Some useful
inequalities for distribution functions," 3 pp. "A lower bound on the
tail of a distribution," 9 pp. "A combinatorial theorem," 1 p. "Some
results on determinants," 3 pp. "Upper and lower bounds for powers of
a matrix with non-negative elements," 3 pp. "The number of sequences
of a given length," 3 pp. "Characteristic for a language with
independent letters," 4 pp. "The probability of error in optimal codes,"
5 pp. "Zero error codes and the zero error capacity C0," 10 pp.
"Lower bound for Pef for a completely connected channel with
feedback," 1 p. "A lower bound for Pe when R > C," 2 pp. "A lower
bound for Pe," 2 pp. "Lower bound with one type of input and many
types of output," 3 pp. "Application of 'sphere-packing' bounds to
feedback case," 8 pp. "A result for the memoryless feedback channel,"
1 p. "Continuity of P e opt as a function of transition probabilities," 1 p.
"Codes of a fixed composition," 1 p. "Relation of Pe to p," 2 pp.
"Bound on Pe for random ode by simple threshold argument," 4 pp.
"A bound on Pe for a random code," 3 pp. "The Feinstein bound," 2
pp. "Relations between probability and minimum word separation," 4
pp. "Inequalities for decodable codes," 3 pp. "Convexity of channel
capacity as a function of transition probabilities," 1 pp. "A geometric
interpretation of channel capacity," 6 pp. "Log moment generating
function for the square of a Gaussian variate," 2 pp. "Upper bound on
Pe for Gaussian channel by expurgated random code," 2 pp. "Lower
bound on Pe in Gaussian channel by minimum distance argument," 2
pp. "The sphere packing bound for the Gaussian power limited
channel," 4 pp. "The ^-terminal channel," 7 pp. "Conditions for
constant mutual information," 2 pp. "The central limit theorem with
large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior
of the distribution function," 5 pp. "Generalized Chebycheff and
Chernoff inequalities," 1 p. "Channels with side information at the
transmitter," 13 pp. "Some miscellaneous results in coding theory," 15
pp. "Error probability bounds for noisy channels," 20 pp.
[105] "Reliable Machines from Unreliable Components," notes of five
lectures, Massachusetts Institute of Technology, Spring 1956, 24 pp. Not
included.
[106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes
taken by W. W. Peterson, Massachusetts Institute of Technology, Spring,
1956, 8 pp. Included in this volume.
[107] "Notes on Relation of Error Probability to Delay in a Noisy Channel,"
notes of a lecture, Massachusetts Institute of Technology, Aug. 30, 1956,
3 pp. Included in this volume.
"Notes on the Kelly Betting Theory of Noisy Information," notes of a
lecture, Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp.
- 10-
Included in this volume.
[109] "The Zero Error Capacity of a Noisy Channel," Institute of Radio
Engineers, Transactions on Information Theory, Vol. IT-2 (September,
1956), pp. S8-S19. Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[110] (With Peter Elias and Amiel Feinstein) "A Note on the Maximum Flow
Through a Network," Institute of Radio Engineers, Transactions on
Information Theory, Vol. IT-2 (December, 1956), pp. 117-119.
(Received July 11, 1956.) Included in Part B.
[Ill] "Certain Results in Coding Theory for Noisy Channels," Information
and Control, Vol. 1 (1957), pp. 6-25. (Received April 22, 1957.)
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[112] "Geometrische Deutung einiger Ergebnisse bei die Berechnung der
Kanal Capazitat" [Geometrical meaning of some results in the
calculation of channel capacity], Nachrichtentechnische Zeit. (N.T.Z.),
Vol. 10 (No. 1, January 1957), pp. 1-4. Not included, since the English
version is included.
[113] "Some Geometrical Results in Channel Capacity," Verband Deutsche
Elektrotechniker Fachber., Vol. 19 (II) (1956), pp. 13-15 =
Nachrichtentechnische Fachber. (N.T.F.), Vol. 6 (1957). English version
of the preceding work. Included in Part A.
[1 14] "Von Neumann's Contribution to Automata Theory," Bulletin American
Mathematical Society, Vol. 64 (No. 3, Part 2, 1958), pp. 123-129.
(Received Feb. 10, 1958.) Included in Part B.
[115] "A Note on a Partial Ordering for Communication Channels,"
Information and Control, Vol. 1 (1958), pp. 390-397. (Received March
24, 1958.) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[116] "Channels With Side Information at the Transmitter," IBM Journal
Research and Development, Vol. 2 (1958), pp. 289-293. (Received Sept.
15, 1958.) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[117] "Probability of Error for Optimal Codes in a Gaussian Channel," Bell
System Technical Journal, Vol. 38 (1959), pp. 611-656. (Received Oct.
17, 1958.) Included in Part A.
[118] "Coding Theorems for a Discrete Source With a Fidelity Criterion,"
Institute of Radio Engineers, International Convention Record, Vol. 7
-11 -
(Part 4, 1959), pp. 142-163. Reprinted with changes in Information and
Decision Processes, edited by R. E. Machol, McGraw-Hill, NY, 1960,
pp. 93-126. Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[119] "Two-Way Communication Channels," in Proceedings Fourth Berkeley
Symposium Probability and Statistics, June 20 - July 30, 1960 , edited by
J. Neyman, Univ. Calif. Press, Berkeley, CA, Vol. 1, 1961, pp. 611-644.
Reprinted in D. Slepian, editor, Key Papers in the Development of
Information Theory, IEEE Press, NY, 1974. Included in Part A.
[120] "Computers and Automation — Progress and Promise in the Twentieth
Century," Man, Science, Learning and Education. The Semicentennial
Lectures at Rice University , edited by S. W. Higginbotham, Supplement
2 to Vol. XLIX, Rice University Studies, Rice Univ., 1963, pp. 201-211.
Included in Part B.
[121] Papers in Information Theory and Cybernetics (in Russian), Izd. Inostr.
Lit., Moscow, 1963, 824 pp. Edited by R. L. Dobrushin and O. B.
Lupanova, preface by A. N. Kolmogorov. Contains Russian translations
of [1], [6], [14], [25], [37], [40], [43], [44], [50], [51], [54]-[56], [65],
[68]-[70], [80], [82], [89], [90], [93], [94], [99], [103], [109]-[111],
[113H119].
[122] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error
Probability for Coding on Discrete Memoryless Channels I,"
Information and Control, Vol. 10 (1967), pp. 65-103. (Received Jan. 18,
1966.) Reprinted in D. Slepian, editor, Key Papers in the Development
of Information Theory, IEEE Press, NY, 1974. Included in Part A.
[123] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error
Probability for Coding on Discrete Memoryless Channels U,"
Information and Control, Vol. 10 (1967), pp. 522-552. (Received Jan.
18, 1966.) Reprinted in D. Slepian, editor, Key Papers in the
Development of Information Theory, IEEE Press, NY, 1974. Included in
Part A.
[124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the
American Driver in England," typescript, All Souls College, Oxford,
Trinity term, 1978, 7 pp. + 8 figs. Included in this volume.
[125] "Claude Shannon's No-Drop Juggling Diorama," Juggler's World, Vol.
34 (March, 1982), pp. 20-22. Included in Part B.
[126] "Scientific Aspects of Juggling," Typescript, circa 1980. Included in
PartB.
[127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. Included in
this volume.
K-t7«IA (-*»*)
is J
Cover Sheet for Technical Memoranda
Research Department
subject: The Use of the Lakato s-Hi okman Relay in a
Subscriber Sender - Case 20878
ROUTING:
i - Patent .Deit. (letter 9/27/40)
/
1 — e— W.W.Ke^all, Case Pile
3 - T.C.Fry
4 - A* B. Clark
s - B.D.Holbrook
6 - G.R.Stibitz
7 - G.V.King
8 -Miss Hanle
mm- 40-130-179
date August 13, 1940
author c.E.Shannon
INDEX NO. S4.2
ABSTRACT
A study is made of the possibilities of using
the Lakato s- Hickman type relay for the counting, regis-
tering, steering, and pulse apportioning operations in
a subscriber sender. Cirouits are shown for the more
important parts of the circuit where it appears that the
new type relay would effeot an eoonomy.
a
Tilt Use of the Lakatos-Hiokman Relay in a Sub bo r iter Sander •
Cast E0878
/
August 15, 1940
MEMORANDUM FOR ITU
The Lakatos-Siokmen type relay1* using the relay springs
as part of the magnetic eiroult can he used as a very eeonomioal
type of pulse counter and registration device. In faot , one suoh
relay with twenty moving springs can count and register up to ten
pulses, while the same operation requires at least five ordinary
relays, and some standard oirouits use as many as twenty to re-
duce the spring loading on the relays and the contact loading in
the pulsing circuit. It has been suggested that this new type
of relay might he used for some or all of the many counting,
steering, and registration oirouits in a subscriber type sender*
The present memorandum gives some oirouits for accomplishing
this* The chief problem in the design of these oirouits Is
that of performing the various translating operations necessary
in converting the incoming pulses into group and brush selections,
or P.C.I, pulses as the oase may be, without using more oontaot
elements than are available on the counting relay. Two different
solutions are given here. The first was made as economical as
possible but at the oost of one disadvantage. Under certain
conditions of oontaet failure in the thousands or hundreds regis-
ter the sender will oonneot the subscriber to an incorrect number
rather than connect ing to a tell-tale and giving him a busy sig-
nal. The seoond oiroult, which we will call the positive aotion
oiroult^, is designed to overcome this difficulty but does so at
the expense of more contaots and wiring. Some compromise between
these circuits may be the most desirable. The oirouits by no
means represent a complete sender. It appears that the problems
connected with the offioe code (i.e. the first two or three
digits) can be handled without muoh difficulty. At any rate
these oirouits will depend on the type of decoder used, and
would represent a second stage in the design* We have therefore
designed what might be called a "four digit sender** considering
only the problems arising in the thousands, hundreds, tens and
units digits. We also have omitted consideration of the parts
of the oiroult used for control and supervisory purposes, since
these can be easily handled by existing oirouits, and do not
directly involve the new type relay. Our chief purpose is to
Isee "Oiroult Analysis for Laxatos-Eiokman Type Relay",
0. R. Stibits, MM40-150-1BO, Jan. 15, 1940, Oase £0878.
^This circuit was suggested by Hr. 0. T. King
■how that the new type counter oontalna sufficient contact
element! for aost of the steering and counting circuit* of the
subscriber sender. It is always possible to add more contacts
at an/ stage in the new type counter by the arrangement of
springs in Jig. 1, but this would be undesirable from the
standpoint of standardization* At any rate it was found that
even in the positive action circuit, only two stages in one
register needed more contacts than are already available, and
two additional ordinary relays were introduced here to carry the
contact load*
It should be pointed out that an extremely simple and
economical sender (i.e., much simpler than those given here)
could be designed using the new type counter were it not for
the peculiar translation codes involved. Thus if we could start
*Yrom scratch" and design translation codes particularly adapted
to the characteristics of the new relay, the circuits could be
made very simple indeed. Even using the existing oodes which
were constructed to simplify the present type olrouits, the use
of the new counter allows a remarkable simplicity and economy*
The circuits were designed by a combination of common
sense and Boolean algebra methods. We will omit the details
involved in their design. Although it is possible that a few
superfluous elements remain, it is doubtful if they can be
simplified very much*
Figure E is a block diagram of the proposed sender*
In the present panel and crossbar senders, pulse counting is
done in the same circuit for each digit and the numbers trans-
ferred from this counting circuit to a set of registering cir-
cuits, one for eaoh digit, through an incoming steering chain.
The registering circuits in the panel type sender consist of a
set of five ordinary relays per digit, while in the crossbar
system the A digit is registered on one or two verticals of a
crossbar switch* In Figure S, on the other hand, eaoh digit
has one of the new type counter relays which acts both as a
pulse counter and as a register. The incoming steering chain
steers the incoming pulses to the correct counter-register
rather than steering the number recorded by the input pulse
counter to a digit register* The input steering chain may or
may not be one of the new type counters* The steering opera-
tion can be done with the new type counter, but it appears to
require special devices, as for example polarised springs, in
order to energize both windings of the register relays after
receiving a digit* Even using the present type of steering
chain a great simplification is possible, for only one wire,
the pulsing lead, needs to be steered to the various digit
registers, rather than the five leads of the present type
sender* Another possibility is using a new type counter to
count the groups of pulses and operate a set of relays 8^, Sj,
Sq, Sthi Sst Sf » sU come 1a after the A, B, 0, IB, I, T,
and U digits are received end energize both eoile of the corre-
sponding registers*
After the digits are registered on the new type
counters, these numbers are translated bj means of the oontaet
interconnections into the code corresponding to the incoming
brush, incoming group, final brush, tens, and units selections,
which are represented by a ground on one of the leads in the
groups marked IB, 10, YB, T, and V, respectively. These groups
of leads are connected in sequence to the revertive pulse counter
by means of the revert ire group counter* The revertive pulse
counter will be one of the new type relays and is connected in
suoh a way as to open the fundamental circuit and thus stop the
revertive pulsing when it reaches the first ground. The revertive
group counter or revertive steering chain, of course, steps ahead
after each group of revertive pulses through the action of a slow
release relay. This last steering operation cannot be done solely
with one of the new type relays for it is necessary to steer ten
leads in the tens and units digits. It could be done, however,
with a new type counter in conjunction with four ordinary relays.
In the case of a call to a manual office the outputs
of the digit registers are translated by a P.O.I, circuit into
the correct P.O.I, codes. This circuit, too, can make use of the
new type counter in the quadrant ing operation, i.e. in apportion-
ing four quadrants to each of the four digits to be transmitted.
This would be done with a sixteen stage counter (or if it is de-
sirable to have all oounters with ten stages, two of these could
be connected "in series") replacing the present sequence switch*
Of course there must be an interlock between the incom-
ing and revertive steering chains to prevent any selection being
made before sufficient information has been received. This can
be done by fairly standard methods*
A rough comparison can be made between the relay re-
quirements of the present panel type sender end the design pro*
posed here. Omitting parts of the circuit which would be sub-
stantially the same the requirements are listed below:
Present
Panel Sender Proposed Sender
Ordinary Hew Type Ordinary
Operation Relays Counters Belays
Input Counting 1* -
Input Steering It i •
Registration »• f
Revertive Counting . *Q t «
Revertive Steering 10 L- JL
Total U T
In addition, a eequenoe ewitoh la replaoed by a new type counter.
Tliasa figures are based on the positive action oirouit. Jhe
other oirouit uses 6 ordinary relays. This eoaparison of the
numbers of relays involved shows only a small part of the saving,
however. The wiring and fundamental method of operation of the
new oirouit is muoh simpler which tends both toward eoonomy and,
providing the new relay ©an be made suffielently reliable, elim-
ination of faults and errors*
It is a little more difficult to give a quantitative
comparison of tha proposed sender with the present crossbar type
sender due to the differences in the types of oirouit elements In-
volved, but it appears that the saving would be of the same order
of magnitude*
The new type counter with ten stages aota like a series
of twenty relays which come in sequentially as the two coils of
the relay are alternately energized. Thus after n pulses the
first Sn relays are operated. If, after a series of pulses only
one of the two coils on a counter remains energized we can only
be sure of the oontacts on that side. It was found that under
these conditions the number of eontaots available was far too
small in all of the four registers for the various translating
operations neoessary. We have therefore assumed the steering
circuit should be designed in such a way as to energize both
coils of a counter after it has received its series of pulses**
This insures the oontacts on both sides and each stage then has
the equivalent of two transfer eontaots and two additional eon-
taots somewhat similar to a switohhook connection. Thus eaoh
stage may be considered as a relay with the eontaots available
indicated In figure 5. Our circuit diagrams are drawn from
this point of view*
Tor the convenience of the reader we will list the
various translation oodes used in the sender* The incoming
brush seleotlon depends only on the thousands digit and Is
given by the following tablet
Incoming Brush
Selection
0
1
t
8
4
Thousands
Digit
0, 1
*, *
4. 5
•See the memorandum "Oirouit Arrangement for Counting Relay with
Mechanically Independent Contact Springs", by B* D. Bolbrook,
HM-40-130-149, July 5, 1940, Oase ££108-1.
The incoming group ssleotion depends on both the
hundreds and thousands digits and is given bj tha following;
Thousands
Digit
Hundred!
Digit
odd
odd
< 6
< 5
Inooeiing Group
Salaotion
0
1
t
9
digit,
Tha final brush salaotion dapands only on tha hundreds
We hare tha following oodat
Hundred!
Digit
0, 6
1. •
*, 1
3, 8
4, •
Final Brush
Salaotion
s
3
4
P.O.I. Oode for Thousands Digit
It should be remembered that an inooming brush, incom-
ing group, or final brush saleotion of & corresponds to n ♦ 1
rerertire pulses. Tha same remark: applies to tha tans and hun-
dreds selection.
Digits are sent to a call indicator bjr series of posi-
tive and negative pulses, four for aaoh digit* Two different
codes are used for this, one for the thousands digit and tha
other for thehuadreda, tans, and units. The thousands oode is
an additive one baaed on the numbers 1, 2, 4, and 8 as follows:
IT
0
0
m
0
m
0
0
1
Thousands
Digit
1
8
5
4
5
*
8
9
0
Corresponding Additive
Fumbers
I
0
0
0
0
0
0
II
0
0
Quadrant
0
0
0
III
0
0
0
0
0
0
0
0
8
- 6 •
The sum of the numbers ocr responding to tht columns in whioh a
digit has tha symbol - gives that digit, henot tha additive
property of tha code. In this tabla I, II. IH, and IT refer
to tha four pulses or quadrants. In the first and third quadrants
0 represents a ground and a - represents a posit ire pulse. In the
even quadrants 0 means a light negative pulse and the -, a hear?
negative pulse. We have chosen this representation of the oode
for comparison with the P.O.I, circuit in which four leads are
grounded or not in aooordanoe with the above table* Thus if the
digit 8 is registered in the thousands place, lends II and HI in
a group I, II, III, IT are grounded. The presence or absence of
these grounds are translated into positive or negative pulses by
two relays TS and RS.
The hundreds, tens, and units P.O.I, code is also addi-
tive based on the numbers 1, S, 4, 6. Using the same conventions
it is represented by the following table:
P.O.I. Oode for Hundreds, Tens, and Units Digits
H, T, or Quadrant
u Digit i n in it
i .000
t o-oo
8 ..00
4 0 0 - 0
5 0 0 0 -
6 -00
T 0 — 0 —
8 - - 0
9 0 0-
0 0 0 0 0
Corresponding
Numbers (1) (8) (4) (5)
The circuit for the tens or units register is shown In Figure 4.
The operation is quite obvious. In the ease of a full mechanical
call, if 6 for example were dialed in the tans plaee, the first
six relays are looked in, which places a ground on the lead marked
6. These are connected through the revert ive steering chain to
the revertive counter which reaches this ground after the seventh
revert ive pulse. The presence of this ground operates a relay
whioh opens the fundamental circuit and stops the pulsing.
A ground is also put on leads II and HI for a P.O.I, call.
The operation of the P.O.I, circuit will be described later.
The thousands and hundreds register is shown in figure 5 for the
positive action circuit and in Figure 6 for the more economical
circuit. In Figure 8, many of the contaots do double duty,
translating both for P.O.I, and full mechanical calls. This is
done through a relay P which is operated for a manual call and
not for amechanical call. In the hundreds register there were
not enough contacts available in the fifth and tenth stages.
The relays R and 8 ere used to •arrjr part of the eontaot load*
This oireuit la designed ae that ohe and only one of the IB, 10,
and TB laada la grounded for a given number. In ease of a oon-
taot failure none would he grounded and the corresponding commu-
tator would supposedly go to a telltale. In the oirouit of figure
6, on the. other hand, more than one of the IB, 10, or TB leads may
he grounded at the same time. Thus if the thousands digit is 8,
both 8 and 4 in the IB group are grounded. If the back eontaet
on 8 failed the rerertive pulse counter would not stop the pulsing
aotion at brush 8 as it should but would go on to the fourth brush.
Howersr, this olreuit is considerably simpler than Figure 8, and
does not appear worse from the standpoint of possible wrong num-
bers than the present type of sender*
The P.C.I, eirouit is shown in Figure 7. I is a relay
whioh is operated in the odd quadrants and not in the even quad-
rants. TS and RS are relays whose windings are oonneoted sequen-
tially through the P.O.I, impulse ehain to first the thousands
P.O.I, leads I, II, IH, and IT, then the hundreds, etc. aoeord-
ing to the following tablet
Th
Digit
H
Digit
T
Digit
Digit
Pulsing
TS
RS
Stage
1
Z
Th I
Th II
8
Th III
Th II
8
z
Th III
Th IT
4
E I
Th IT
8
z
E I
E II
8
a*
E III
e n
; i
z
E III
E IT
i 8
m
T I
E IT
; •
z
T I
t n
10
m
T in
t n
11
z
T HI
T IT
;i»
U I
T IT
[18
z
V I
u n
u in
u n
18
z
v m
U IT
18
U IT
In the odd quadrants Z is operated, placing a ground on the
fundamental ring (»)• The fundamental tip (FT) ia connected
through Z to either ground or positive battery according as
TS is operated or not. This depends of course on the condl-
- 8 -
t ion of the P.C.I, lead to whioh TS is connected at the time*
Similarly in the eran quadrants light or beary roltage is
applied to FR according to the eondition of RS while FT is
grounded*
Figure 8 shows the rerertire steering chain and re-
rertire pulse counter.
0. S* SHANNON
FIG. 3
— I
— u
V~ m
>
I 7
a
L 9 J
FIG. 4-
TS/VS OR UMTS #£6/ST£K
X
u
■ Vj
TITLE
Vi
Vi
■
SCALE
Mil TtimM! UMIITMIIS. IK.. Ill
f
T1TLE
1
X
u
<\J
<\J
t
■i
SCALE
KU TELEPIHt UMMTMItt. IK.. »
M
■
■
J
E
E
ES
<
PRINTED INU S •
M M S H 0-C\J<T>«-
rr~i
3=)
n~i
Ah*, ^^h.
D
rrn
r~r~i
3 C"
<Hi- *<Hl<
k
^3
o <\j «i
■5
O - WO 1
I 1 ■
6
CM
9
5^
SCALE
IELI TEIEMW1E UMIUTOIIES, l*C. IE!
ES
PHI IN U.t.A.
l ill-A l«-3»)
F/0. 7
P.C.I. C/RCU/T
TITLE
\*
X
u
■ Vj
V)
V)
pi
►-
SCALE
lilt TELIPMIE liMIITMIH. IK.. lit
J
I
E
ES
<
IB <
16
0-
I -
2-
3-
4-
'o-
Z-
3-
T I
5-
6-
7 •
8-
0 <
-o o-
-o t>-
3 3
o o
9*
-o o-
o o-
■o o-
c o
o o
o o-
s's
o o
o o
K3 o
o O
S5
-O O-
-o o-
6
■ 7
9
W
x<~
I
2
j
4
5
6
7
?
8
9
10
hT
X
u
w
■ Vj
TITLE
vi
Vj"
8*
i
SCALE
Kit TEUF.ni UMUTHICI. IK., It* Tti
f
<
ES
A STUDY CF THE DEFLECTION MECHANISM
AND SOME RESULTS ON RATE FINDERS
by TKfS is a Final
UNDER OmU .T
Claude E. Shannon ^.w/L-lL - if) 4
SUMMARY OF THE MOST IMPORTANT RESULTS
1. The deflection mechanism may be divided into three partB.
The first is driven by two shafts and has one shaft as out-
put, which feeds the second part. This unit has a single
shaft output which serves as input to the third part, whose
output is also a single shaft, used as the desired azimuth cor-
rection.
2. The first unit is a simple integrator. It*, output rate is
3. The second part is the same circuit as previous rate finders.
Its presence appears to be detrimental to the operation of
the system from several standpoints. The output e of this part
satisfies i
• ■ x-f- y
Ll
4. The third and most important part of the macnine satisfies
q + R 4 + L q - •
in whicht
• ■ an input forcing function which except for transients in
the seoond part and other small effeots ia the function
whose rate is to bo found.
q ■ the rate of e as found by the device. The output of the
mechanism is sin"^" Q.
R, L, S are. positive constants depending on the gear ratios,
etc. in the machine.
The mechanism therefore acts like an R, L, C circuit in which
the differential inductance is a function of the current,
v 1 - q2
The system can be critically damped for differential displace-
ments near at most two values of the current.
Omitting the effect of backlash, the system is stable for any
initial conditions whatever, with a linear forcing function,
e s At + fl. It will approach asymptotically and possibly with
osoillation a position where q is proportional to e. An error
function can be found which decreases at a rate -R (q - qQ)2
4o being the asymptotic value of q.
If the system is less than critically damped ordinary gear
play type of backlash can and will cause oscillation. This
includes play in gears, aaaers, lead screws, rack and pinions
and looseness of balls in the integrator carriages. The oscilla-
tion is not unstable in the sense of being erratic, or growing
- 3 -
without limit, but is of a perfectly definite frequency and
amplitude. This type of backlash acts exactly like a peculiar
shaped periodic forcing function. Approximate formulas for
the frequenoy and amplitude of the oscillation are
r
2
and
/s2 I UoLd -A)2
<*0c
^ and B2 being the amounts of backlash in the two driven shafts
as measured in a certain manner.
8. elastic deformations of shafts and plates can be divided into
two parts. .One is exactly equivalent to the gear type of
backlash and may be grouped with B]_ and B2 above. The other
has the effect of altering the parameters R, L, S of the cir-
cuit and also adding higher order derivatives with small co-
efficients. This will slightly alter the time constant and
the natural frequency of the system.
9. The manner in which the arcsin function is obtained seems to
me distinctly disadvantageous to the operation of the system
for a nnmber of reasons, chiufly since to eliminate backlash
oscillation it requires high overdamping near q ■ 0 and this
slows down the response for low target speeds.
10. The general problem of rate finding and snoo-hing is con-
sidered briefly from two angles - as a problem in approxi-
mating a certain given transfer admittance ana as a problem
in finding the form of a differential equation. The first
method based on a linear differential equation leads to ten-
tative designs whicn I think would be an improvement over the
present one. The second method indicates the -ossibility of
still more improvement if non-linear equations can be satis-
factorily analyzed.
ANALYSIS OF THE DEFLECTION MECHANISM
general Considerations. The deflection mechanism is a aevice de-
signed to find 5i mechanically from the formula
• in*! = Sa^ tp
having cne shaft whose rate of turning is£a and another whose
angular position is Jj> t?f giving c-t as the position of a shaft.
The system is also supposed to smooth out small errors in^a*
The mechanism, as actually constructed, is shown in
Figure 1. By a rearrangement of adders, it may be drawn as shown
in Figure 2. incidently, the device of rearranging and combining
adder units is frequently useful in studying these systens. In
this case it both clarifies the physical operation and simplifies
the mathematical analysis. The box IV on the right of Fig. 1
represents two adders wigh, essentially, a common shaf t. The
output is equal to the sum of the inputs with the indicated signs
prefixed. A variable associated with a shaft represents the angu-
lar position of that shaft unless specifically stated otherwise.
Gears art omitted f rom t he diagram but included as coefficients
in the equations. It may also be worthwhile to point out that the
best method of setting down the equation of such a system is
usually the following:
1. Considering oniy the integrators and function Lie-vices,
label the various snafts UBing the minimum number of variaoles,
Yiorkin^ backward from driver to driving snafts. Thus if the out-
put of an integrator is labeled z, its displacement is i (assuming
constant disk rate). If the output of an x to In x gear is sin u,
its input is esin u . Marking backwards rives the differential
instead of the integral form of the equation.
2. Hew concentrate on the adders, grouping together cs
many as possible, and write the equations of constrain*. These
will be the equations of the system.
I find the use of electrical analogues very useful in
under standing tnese devices and have sed throughout a notation
which emchasizes this idea.
As the maohine is drawn in Fig. 2, it consists of threa
independently operating units. The output of the first i3 a
single shaft serving as input to the second, the output of the
second a single shaft feeding the third, and the output of this
being a shaft used as S 3,
The operation is ruughly as follows: Integrator I
multiplies its disk rate oy its displacement, so that the rate
of turning of its output is y = ^0 tp£a» The actual position of
this y shaft can carry no significance. It is
y ■
p. tp2a dt +• y0
a variable which cepencs on the entire previous history of tne
sighting telescopes to say nothing of possiole integrator slippage.
At two different tisas, vrith a target at the same position and
speed, this shaft would have entirely different angular nositions
but the same rate of turning.
The output of integrator I feeds into the middle uart
cf the system which is exactly the rate finder, of saost older
directors. This part of the divice seems to me net only super-
fluous but actually detrimental to the operation. It is equiva-
lent to an R, L, circuit (Fig. 3) with impressed voltage y and
cutout x, che voltage across the inductance
3. A small response h(t) for the function g(t).
High frequencies in g(t) appear practically un-
diminished and in the same pnase in h(t) since the
impedance is high compared to R.
Thus
- % t
In ^
1a t £e + h(t)
In adder III, x is added to y in equal proportions to give e.
e _ y + ±1 A +• K e Ll + h(t)
Rl
As vre pointed out above, y already contains an irrelevant additive
constant, so the addition of another, gj" A which happens to be pro-
portional to the target rate is of no possible significance. The
term K e ' certainly is only detrimental being an unwanted
transient. For a time I thought that the reason for the middle
part of the machine was the final term h(t). For hi^h frequen-
cies this is approximately g(t), and might be used to buck out
these high frequency following errors, much as was done in some
early radio circuits to recuce a-c hum. However, a study of the
design diagrams shows that the two error functions are actually
in phase as I have indicated in the equation, so that these high
frequency errors are added, making the situation worse. £ven if
the phase of x were reversed on entering adder III, I think it
doubtful whether the presence of this part of the system -would be
justifiable. It would be necessary to show that tne frequencies •
were high eno.gh so that the two actually did cancel, and also
that the disadvantages of the transient term did not overcome the
advantages obtained. Note that the middle part can function in
no way as a rate finder. The ri^ht hand part of the machine does
its own rate finding as we will see, and the rate found by the
middle part could not possibly be used because of the undetermined
constant in y.
•e prooeed now to the third part of the machine which
is the major concern of the study. Concentrating on the adder IV,
the equation of the system is obviously
L -| sin"1 q=e-3q-Rq
or
5 qt iiL L q = e
This is the equation of a series R, L, C, circuit with the in-
ductance a function of the current passing through it. Induc-
tance may be defined by the Lagrangian equations or by
- 10 -
and it is clear from the above equation that
A i ■ l sin"1 i
-1
or A . L Bia 1
This function varies as shoim in figure 4. For our work, however
a more useful parameter is what is sometimes called the differential
inductanoe which nay be defined by
so that in our case
This inductance is useful when we have an equilibrium current qg
and are considering the effect of small variations about this equi-
librium. Omitting second order terms the system will be equivalent
to one with constant R, L, G parameters, the inductance being
taken as L^. The variation of L-q with current is snown in figure 5.
The action is the opposite of that of a "swinging" choke where, be-
cause of saturation, the differential inductance decreases with
large currents.
The mechanical idea behind the operation of this system
is quite simple. Suppose shaft e to be turning at a constant rate.
The system will be in equilibrium if the displacement of integrator V
is such as to make its output feeding into the adder equal and op-
posite to e, and the displacement of integrator VI at zero. Under
these conditions, shaft q measures the rate of e and shaft V, the
output of the device, the arcsin of this rate, if the rates are
not correct, the adder changes the second derivative shaft in
such a direction as to equalize the rates. The q shaft serves as
a danper to prevent continual oscillation aoout the equilibrium
position.
- 12 -
MATHEMATICAL THEORY (Backlash not Present)
Differential Operation
If e is turning at a constant rate and the system is at
equilibrium, and then a small differential disturbance is applied
to the system, it will clearly respond very nearly like an R, L,
C, circuit with constant parameters, the inductance used being the
differential inductance for the equilibrium current
L
y'i - 41
Such a system has a tine constant of
2 Leff
2L
T x
a
tyl - q|
It is critically damped if
H2 - 4 Leff S ■
4L S
which, of course, only occurs at
16 i/
For values of q greater in absolute value than this, the system is
oscillatory, for values less, over damped.
- 13 -
Proof of General Stability -with Linear e
In proving the stability of this system, I have used a
method -which may be new in some respects. It was suggested by the
fact that in a non-dissipative mecnanioal system, the potential
energy U is a minimum at a point where the system is differentially
stable, and the method is, in a sense, a generalization of that
criterion. It is not, however, limited to differential stability,
or to non-dissipacive systems. Since the method may be of use in
other investigations of this type, I will first describe it in
general terms.
Suppose we have a differential equation system in which
n variables and derivatives may be specified independently in the
initial conditions. 7<e will say that the system is stable for all
initial conditions and all driving functions if any two solutions
of the system with the same driving funoiions approach each other
in the sense that
Lim 2 \x± - y±\ - o
t ->co i - r
where xj^t), x2( t) . . .x^t) is one solution and yx(t) ...yn(t) the
other. If this limit is zero for certain types of driving functions,
we will say the system is stable for these functions.
Thereomi If a continuous function Q(x1...zn, y1...yn,t) can be
found having the following properties '
X. Q>0 for all x±, yt, t, the equality holding if and
only if x± a y±.
- 14 -
2, dQ at all times, when the x^ and y^ are solutions
of the system, with the same driving function.
3. It is impossible for Q to remain indef initelj>A ^ 0.
Then the system is completely stable.
For the function Q is non- increasing but always^ 0 and
must therefore approach a limit A>0 as t ~>oo , but by 5. A^O
is impossible, hence A = 0, and each Ix^-y^/ — 5>0.
Conversely, it oan be shown that if only a single forc-
ing function is involved, and the system is stable for this funo-
tion, a Q exists of the type described.
Roughly, the method is to find a "distance" or "error"
function Q between two solutions which is zero only when the so-
lutions are identical and which always decreases.
As an example of this method it is easy to prove the
complete stability of the ordinary R, L, C, circuit with constant .
parameters without solving the equation. The differential equation
is
" Sq + R$ + L q = e
and we choose q and \ as coordinates. Let two solutions be q1#
q^and q2, q2«nd consider the funoticn Q = y (qi-q2)2+ £ (qx-qg) .
Condition 1 is obviously satisfied. How
||- SCqi-qgXqi-qg) + L(q^-q'2) (aj-qg)
- -r (ii-42)2£o
- 16 -
. S (n - At - 3 . EA)2
S
obviously the minimum of Q with respect to q occurs at
At B - SA
q - s + s
Also • a
q - s
ciQ = L
y 1 - q
which vanishes only for q'f It is readily verified that this
is a minimum, and that (J is zero at this point for any t. Now
dt oq »
i - s
5S(q-4-| + §)0..4)>L
S S 3- ~
1-q
and
Vl-q8
q s ^
- (At t- 3 - 3 q - R q)
if q rjid q satisfy
Sq f Bq + L > At +- B.
V 1 - q2
- 17 -
Hence
d| « (Sq - At - B f J£) (q - ±)
~ (4 " -f)Ut + 3 - Sq - Rq)
■ -E (q - |)2 * 0
Note that this rate is identical with that found in the linear case.
Incidentally, it was by working baokward from this rate that a
suitable function Q was first found.
For Q to approaoh a limit K>0, it is necessary for q
to approach zero, and q therefore, to approaoh a linear function
of t differing by a constant from its equilibrium value. But from
the original differential equation q must approach a oonstant different
from zero, which contradicts 4^0. This does not however, quite com-
plete the stability proof due to a certain meohanical peculiarity of the
system. Let us plot the equilevel lines of Q against axes X * (q - At
- | and Y « q. (Figure 6).
The x io sin x gear in tne ac-cuai mecnanisn has a limited
movement, and is prevented f rem going too far by e slip clutch and
stop. If ' q Z 1, the stop prevents ;qj from increasing anymore.
The original equation is replaced by
•
until the pressure on the stop reverses, oo far we have snowi that
under the original equation Q always aecreases. In terms of our
plot this means that if we start a solution inside the curve marked C,
the solution will certainly converge to the equilibrium position, for
the solution can never "escape" from C and hit one of the two lines
1 = r K, where the differential equation changes. ^7hen we are not on
- 19
one of these lines a solution will, in fact, spiral inward in the
clockwise sense, as maybe seen by writing the differential equation
in the form
(n - i* B 3A, R As _ L a
Consider the s igns of 5 and (q-A/s) in the four quadrants about the
equilibrium position. In I for example (q-A/S) > 0 and the X coordl-
nate of a solution must increase with tj q < 0 so q must decrease,
giving a clockwise sense to the notion. Similarly the other quadrants
may be verified. Some of the solutions starting out3ide of C will hit one of
the lines, but the solution will still be stable. It is easy to show,
by a study of the signs of the variables and their rates that a solu-
tion can only hit the upper line to the left of the point with
-
coordinates I = 1 (| - £) and Y . K, and that if one does, it will
nove along the lins to the right until it reaches P-^ and then return
to the original equation. similar situation holds for the lower
line. If we should start a solution on the upper line to the right
of Pj it would leave the line immediately. The solution is always
horizontal (i.e. q ■ <)) on tne line through P^, the equilibrium
point and Pg.
If R ■ 0 the function Q is constant since £S ■ o &nd
dt
therefore the solutions of the equation
Sq L q ■ At + B
- 20 -
are" the equilevel curves in Figure 6.
I have attempted in several different -ways to generalize
this proof for arbitrary input functions e(t), but so far have
no completely rigorous proof, dowever, some of the arguments
come so near as to make me almost certain of oomplete stability.
It can be shown, for example, that two different solutions with
the same e(t> cannot definitely divergei i.e. |qj>-q2| f | |i-4g \
cannot become and remain greater than some positive constant
(assuming e and e' bounded). Also if two solutions get close
together (with respect to both q and q), they will certainly con-
verge.
The Effect of Backlash
— — — — _____
In order to understand how backlash can cause oscillation,
let us first consider a much simplified case. Suppose we have a
second order linear system which is less than critically danmed with
no backlash (Figure 7).
Sq -f- R 4 + Lq-e
If, at t " 0 we suddenly impress e - E (constant) on the system
(q - \ = 0), the response is a damped oscillation (Figure 8).
- 21 -
Now in the mechanical system there are only two rf i
oniy two driven shales
811(1 B» and backlash only affB(.+. C •
or thes p dirCCtly) thS °Pe^ion
of these. ,robably tne gr
^ 18 W the adder av«+o„
driving shaft A. Let us assume for
assume for a moment that this is the
only backlash present and that its act.
shaft. 18 " f°ll0W8< ™*»
shaft a reverses airection ■ ( i.a whfln . n/
U.e. when q - 0) there i8 a Bhor±
— - * ^s w h01d„ ~ ~"
shaft ■ ^ &S MUUrfld from the ,
^ Xt 18 that the response of the
lash i. *h SyStem ^ bac^-
lash is the same as the response would be if the
lash and at the ti - "° ^
^ ^ ^ '™ <™sly Creasing -
aoout to increase) we turn the e shaft B
. w f 8haft "Bl «ni in such a way
8 ^ * — ^ing this turning.
snarly at the nest reversal we L±ve . .
mcre,ent Bj keeping J constant through th-
in n.v, 6 8 Peri°d 0f °acklash.
In other words, the res onse i8 that ^
that 01 a V-tea, without back-
lash on which we impress as f
& uxi0T;ion a wave wnich is
aoout as shown in Figure 9.
- 22 -
If the periods of backlash are comparatively short, the small
connecting portions (actually quadratic polynomials in time)
will have little effect on the response. That is, we can assume
a square topped wave with little error in $ or q especially, due
to the smoothing operation of the integrators (or, said another
way, cue to the high impedance of the circuit to ;a.gh frequencies).
How suppose that there is a certain amount of backlash
in shaft B. The action of this is to cause the carriage of the
upper integrator to remain stationary for a small period when
n
q I 0. The same effect would be achieved if, at tnis time, we
suddenly impressed on e a pulse wnich held the lower integrator
at fero and kept changing e at sucn a rate as to keep the lower
integrator there. lie keep the integrator at zero long enough so
that its output \70uld have turned an amount equal to the backlash
in B and then suddenly return it to its proper value, -his means
that the area of the pulse must equal the backlash. The shape of
this pulse would be a linear function of tine, but here again it
is not highly significant.
The entire system may thus be. replaced by one which is
free of backlash and subject to a- driving function of the type
shown in Figure 10, wnere B± is the backlash in A as measured
23 -
from e and Bg is the amount in B as measured from e (in the sense
that if e covers an area B2, shaft B moves an amount equal to itB
backlash) .
It is easy to see from our diagram that this forcing
function is in the correct phase to sustain the oscillation
of decay.
Tne fundamental component of this forcing function is
easily lound. .Ye have
T
Aj_ = y 6 sin — t^. dt
1
o
e may be split into a sum - one term for the square wave and
oae for the pulse-like 32 part. The i^2 pulse is all concentrated
near the center of the sine wave where it is nearly unity. Jfenoe
approximately
T
AX - | 2 h. sin 2*t dt 4B2
2 X r|»
^ o
= f-l 4 f o B2
it
The period T of this oscillation is the natural damped period
of the system, to within a small error of size comparable to the
length of tire during which backlash is effective. Hence itw
- 24
frequency is approximately
t - i fi T2
and the magnitude of the fundamental component of the response q
is
2£i 4 f 0 B2
I .
i R2 (coqLd- i \Z
"oc
Providing the quantity f!l 4 foB2 is 8111611 » the d*'
flection mechanism will behave linearly about its equilibrium
position and the above formulae would approximately hold. If
|qj / 0 the equilibrium value of inductance L would
/l^4q~
probably be as good as any to use since the differential inductance
is greater on one side and less on the other. At 4 - 0 the inductance
is greater on each side and a somewhat higher value should be used,
depending on 2B1 4f0B2» If tne 8ystem is more tnan critically
if
damped, q may or may not have an inflection point depending on the
initial conditions. If they are such that the driven shafts do
not reverse backlash cannot take effect and there should be no
oscillation. However, if they do reverse once, the system may
receive the equivalent of a "kick" in such a direction as to
cause another reversal and so on, so that oscillation is set up.
ihis problem has not been very well decided but if this happens,
the amplitude formula above should still hold, while the frequency
formula will not.
- 25 -
The question of "spring backlash" i.e. undesired effects
due to elastic deformations of shafts and mounting plates has been
raised. Acoording to Hooke's Law the angular strain in a shaft
is proportional to the applied torque. This torque in a shaft
the first term wnose si^n is that of -x1, being due to a coulomb
friction load, the second to a viscous friction load and the third
an accelerating torque.
It is clear that the coulomo friction term I, can be
combined with tie ordinary gear type backlasn treated above, and
acts, therefor s, like a periodic forcing function. The effect of
the other terms is ^uit.; different, their presence causes small
changes in the parameters and 6 of the circuit and also
adds higher derivatives to the equation. Let us consider only the
spring in the shafts feeding L q (i.e. assume q driven
whose position is x(t) can probably be very well approximated by
an equation of the form
I = ±\ +■ 2g ac« t K3 x"
(Sq - P1 q - Pz q)
(R 4 - fx q - ig «')
or
- 26 -
Sq + (R-Pi) q
'F2 - *1. 1
- r2 V = (e- «x i - a2e) - eX(t)
Spring in the drive to q a similar effeot although
complicated by the non-circular sine gears.
If e is a linear function of t, so is e^ and the forcing
function thus contains nothing to create a sustained oscillation.
The left-hand side differs only by small quantities from the ideal
equation
Sq - Sq - _Ji__ q = ex
, l-q>
and will therefore surely approach the solution
Thus we see that the "spring type" of backlash cannot cause sus-
tained oscillation as the ;,gear" type of backlash can. However,
if the gear type is present, the spring type can aid oscillation
by reducing the damping, it may be necessary to overdamp in some
cases in order to get an effective critical damping.
It should be pointed out that the gear type of backlash
may not be quite as simple as we have assumed, particularly in the
L a
shafts driving q 9 If the integrator carriage load is large
aanpared to the friction loads in the adders and gears, then we
are probably justified in assuming that gear pressures in the
drive only reverse when the driven shaft reverses, however, if
this is not the case, a backlash effect can easily take place at
other times, for example -when one of the shafts feeding the adder
reverses, without necessarily reversing the driven shaft \
The situation could become quite complicated, the equivalent input
function containing several different sized steps occurring at
different times, however, the fundamental frequency should Btill
be approximately the natural damped frequency of the system, pro-
viding the backlash effects are small and occur only during a small
fraction of the time.
The fact that backlash can cause a sustained oscillation
leads to a cfitioism of the design of the mechanism, in particular
to the metnod whereby the ercsin function is obtained. Note that
reducing the amount of gear backlash 4f 0B2 will reduce the
amplitude of oscillation proportionately, but apparently the only
way to eliminate it completely is to at least critically damp
the system for all equilibrium points, so that the shafts do not,
in general, reverse direction. In the deflection mechanism as
it stands, this would be distinctly disadvantageous, for if we
critically damp at the maximum values of jijj, (the governing
points) the system will be much over-damped near Q • 0, and in
fact for most values of 4 due to tiie shape of the induct anoe
curve.
Another related argument against the manner of getting
the arcsin is that the repponse to high frequency error functions
depends on the value of q. It seems to me that the treatment of
error functions should be independent of thet);arget speed -
- 28 -
what is best for one will be best for another - since the predictlo:
error we can tolerate is an absolute quantity, not dependent on the
target speed. There may be some objection to this argument on the
groundi that at higher target speeds the error funotion is apt to
be larger, and hence the circuit should have a larger impedance,
but even so it would only be accidental if the peculiar variation
introduced by the sinegear was anything like an approximation to
the desired variation.
Finally, a minor argument against the position of the
sine gear is that the equation becomes so difficult to handle
mathematically. A design of this type must be largely intuitive
or experimental - there is not much chance of ohoosing the con-
stants for the best operation by a mathematical formulation, or of
determining to speed of response etc analytically.
These difficulties might be avoided in several ways. The
arcsin might, for example, be introduced as in Figure 11.
No doubt the reason this was not done was because -with [ \{ near
1, running the sin x gear backward is not mechanically practical,
the gearing up ratio being too great. This objection could be
- 29 -
overcome in two ways - either a new gear K arcsin x to x (k large)
could be used and the parameters R, L, 3 all decreased by a factor
of k (or the integrator disks might be speeded up in suitable
ratios), or, if this were not mechanically feasible, a rapid re-
sponse servo mechanism could be introduced in the output, Figure 12.
This system, can, by the way, be solved in closed analytic form
when i is a constant, and reduced tc a quadrature in any case.
The essential feature of this circuit is that the functions of
rate finding and smoothing, and of taking the arcsin have oeen
isolated. ,ach part can be designed to do its own job the best
without comoromise. It may be noted that the arcsin circuit
aoove also performs a smoothing operation which depends on target
soeed. Sy suitable choice of the parameters we can make this
larr;e or small fs T.-e desire.
The ideal Hate Finder aaa Smoother
Let us consider the problem of rate finding and smooth-
ing from a general standoom^ and as* what mathematical opera-
tion a macnine snould perform to act as zhe "best possible* rate
finder. Cf course, rni s question has many answers, depending
chiefly on what assumptions we make as to the input function,
3'
- 30 -
and what mathematical limitations we put on the machine. Tile
shall assume throughout that the input function e(t) consists of
a series of linear parts with cunrea connecting portions and with
a small superimposed error function, and that we only desire the
rate during (that is, some time after the start of; a linear part.
In this section we assume there ar; no limitations whatever on the
machine - that we can build a machine tc perform any operations we
can ascribe, in particular those a mathematician might use tc
solve the problem. How there is considerable experimental and
theoretical justification to the t -eory that the best way to fit
a curve of a biven type tc a set of points subject to an observa-
tional error is in the least square sense. If we assume this tc
be true in our case, and attempt tc fit e straight line to the
last a seconds before tj of the curve e(tj, we must minimize the
integral
*l
I s e - (At-B) 2 dt
with respect to A and B. The quantity a represents the length of
the curve used in the fitting process, ne would like to use as
much of the curve as actually represents a linear segment to get the
best accuracy, but certainly no more. A person doing the curve
fitting could look at e(t) and see fairly well where the curve
showed a real tendency to depart from linearity, and select accor-
dingly. Mathematically it could be done as follows. Suppose the
31 V
-31-
standard deviation of the error is 6 and that errors of more than
say 4cr are almost certainly due to a significant departure from
linearity in the curve. We oould choose a such that it is as large
as possible without making the error I e-(At'B) | (A, B chosen to
minimize I) tj-a £r t ^ greater than 4<f. In other words we use
as muoh of the curve as we can assume linear within observational
errors. As a final refinement of the solution it might be desirable
to include a weighting function W(a.t) in the integral I, weighting
the more recent values more heavily. The final evaluation of the
rate is then the value of A given when we minimise the funotion
ftl
l(A,B.a) 8 re-(AttB) J2 *(t,a) dt
u t]_-a
on A and B, a fixed, giving A and B as functions of a, and then
cnoose a as large as possible with
| e - (At+B)| ± K C tx - aftf
This solution can be put into a more explicit form,
but even wnen greatly simplified it appears that it would be quite
difficult to carry out the calculations accurately by meohanioal
means. The main difficulty is that apparently such a machine must
be caoable of remembering exactly the past history of an arbitrary
function, e or something derived from it. The only methods I know
Of doing this are quite inaccurate, or else very complex, and it
seems likely that ^he gain in mathematical precision of the above
3%
- 32 -
formulation -would be more than offset by a loss in mechanical pre-
cision.
Differential Analyzer Types of Machines
Tc become a bit more practical, let us now confine our
attention to machines of what, might be called the differential
analyzer type. 3y this, vre mean machines constructed of a finite
combination of adders, integrators, and function elements (e.g.
non-circular gears). Two shafts e(t> and kt enter the machine
-
ana ore shaft u(t) leave b the macnine. It can be shown that any
such system must satisfy a dif f erect ial equation of the type
. • (n)
*(q.q ... q ,t) = e(t)
with
u(t) a qU).
First, we ask what can bo said about the form of this equation to
maJce the machine act as a satisfactory rate finder in our sense.
1. ..ith the same initial conditions and the same e(t) the
macnine snoula certainly resDond the same independent of
the Time of start, hence f does not depend on t.
2. .lien e = At B the equation must have an equilibrium solution
q^ ^ ■ A q(* ^) = o
(i-D
q = At e •
t i
i i
t
- 33 -
If i>l, the carriage of an integrator will be continuously moving
in the equilibrium condition. This does not seem practical for the
initial conditions may be anything depending on past history, and
the integrator would surely go off scale in many cases. Obviously
from the equilibrium solution, i is uot G, for this would icply a
constant equal to a linear function of time. Hence i = 1 and
q' = u(t).
3. Let
f U.y) s f (x,y,0, ... 0)
jue to the equilibrium solution
f (At -i- C, A) = At - 3
for all kt J, t.
it - jH*.y) A - A
it j s.
f (x,y) = X + h (y)
" tit
4. Assuming f is fairly "well behaved", we have near q » q = ...
■ q(n) ■ p (i.e. near equilibrium)
f ■ f (q, q, 0, C, ... , 0 )
q *q ^w
■ q h (q) * a2 q^ ... % q
34 -
and the differential operation depends on the coefficients
&2 ••• a^and h (q). As this differential operation should not
depend on t, the a^^ must be indepencent of q, for in equilibrium
q cnanges with t. Ihey may aepend on \ however in which case the
differential operation depends on the target speed, which may or
may not be desirable. In the deflection mechanism this is the
case, ag ■ 1
T-F"
5. iith q near a the above reduces to
f • q f q — a2q— ... — a_ q(fl)-~ b
where a^ ■ h» (a) and b - h(A}-Ah'(A). To eliminate backlash os-
cillation the roots cf this equation should all be real and for
stability all should be negative, for all desired A.
6. For complete stabil ty, there are no doubt further requirements
on the. form cf f. This problem, however, is still unsolved.
The above are only requirements on the form of f so that
it actually does find a satisfactory rate. To find the best form
of f would roquire u. very elaborate mathematical analysis if possible
at all. ■
If we restrict our machine still further and assume a
linear differential equation with cons-cant coefficients, it is
possible to ^ive a fairly rational analysis leading to the best
values of the coefficients. The question is this. Given the
equation
- 35 -
»0 q *i q' ••• »n q(n) ■ e
What values of the coefficients a0 ... a^ give the best rate-
finding smoothing properties? From what we said above, it seems
that the characteristic equation
-> *n P
should have only real negative roots and that the rate found will
be q'. We may normalize the equation by assuming a0 ■ 1 so that
q* is actually the rate and not merely proportional to it. In
the Heaviside symbolio notation, we have
q' =
-V(V 1)
writing the polynomial in the factored form. The b^ are positive
real numbers and are the time constants in the transient part of
the response. We assume the b, arranged in increasing magnitude.
Let us frsae the problem as follows. Keeping the speed
of response of the circuit the same, what values of the b give
the best attenuation of the error function. Of course, the trouble
appears in trying tc decide what we mean by keeping the speed of
response the same, ^'ne answer is that we keep the maximum time
constant, that is t_. the same. This may be partially justified
on the following grc«ndsi 1. For "almost all" initial conditions,
the term A e"-~ will eventually dominate the transient response,
24:
- oo
the other terms becoming arbitrarily small in comparison. The
only time when this fails is when the coefficient happens to
come out zero.
2. In the worst cases (other coefficients small in comparison)
the bn term dominates for all t, and the machine should perhaps be
designed with the worst conditions as governing.
3. If we use this criterion, it is easy to show that for best at-
tenuation of error frequencies all the b^ should be equal. For
the magnitude of the transfer admittance (e to q*) is
= li
2 2,
V (1- bk uj )
which is obviously smallest when each bk is made as large as
possible, for all frequencies. That is, each b^ ■ bn the maximum.
Another way the "same speed of response" might be in-
terpreted is in terms of the expected area under the transient
time curve. Keeping the standard deviation of this area con-
stant seems to give the same evaluation of the bk as above but
there are certain statistical assumptions in my proof that may
render it invalid.
If the characteristic equation has real roots, it may
be set up nicely as in Figure 13.
This circuit appears to have an advantage from the backlash
point of view over the more owvious one shown in Figure 14.
S 7 3s
, ^ver that the use of nonlinear equation.
It seems quite possible, however.
+otr« Consider the equation
could offer a real advantage.
S(q) q + Kfl> 4 S *
• *. are functions of When the system
where the three coefficxent. ere fu
< + acts approximately likex
i. at equilibrxum.it acts a. p
3(0) q 4- K0) q' - « " *
be adlusted to give critical aamp-
^ these three constat, could beadj
Man of the error function frequencies. On
ing and a good attenuatxon of tw
* at or near equilibrium, q. is
the other hand, when we are not at or
ki different from, tero. The values of the
(usually) considerably dxfferen*
(usually; w to g.ve a very
three coefficients could be adjust
, thuB .pproaoh the equilibrium posxtion faster,
rapid response, and thus appro
, v^ver that there is some fundamental error xn
It is possible, however, tnax
"w * .« attempt to do this would
- *„* for example, that an attempt w
this reasonxng, ror exwny
necessarily cause oscillation.
r irrJ-» j^SSS: ^cuits.
^T^T- — ... — - — - -
r
D3
Si
A HEIGHT DATA SMOOTHING iIECH/iHI3M
Claude J2. Shannon
5/S6/41
A HEIGHT DATA SMOOTHING UECHANISa
The so hematic diagram of a new type of height data
smoothing me onanism Is shown In /igure 1. The discontinuous
height data e(t) Is fed into the input shaft at intervals.
This drives a differential, oonneoted also to the ball car-
riage and roller of an Integrator whose disk is turned by a
constant speed motor. A correcting hand wheel and the inte-
grator roller feed another differential whose output is the
output of the device. The output and input of the machine are
compared through a differential feeding dial. The operator
is supposed to turn the handwheel In suoh a way that the posi-
tive and negative oscillations of the dial about zero are
equal.
The actual height of the target h(t) is a continuous
function of time and we may assume that Just after each read-
ing e(t) is an approximation to this* Thus h(t) and e(t) might
be as shown in Figure 2.
The shaft y(t) clearly satisfies the equation
(1) 7 ♦ £ 7* • «(t) .
The z shaft satisfies
(2) x(tJ - yit) ♦ olt)
and the dial roads
(3) D(t) - e(t) - xUi .
During the period between height readings the position of the
alt) shaft is constant, aay sit^), the reading TiaJcen at ta,
y *; y - 9<V
/ * » -a( t - 1_ ) <.
y - ett^ + ^ e * tn - t v tn + x
Since y is obviously continuous, it will follow a curve con-
sisting of a series of connected exponentials, each with the
same tine constant, 1 • The continuity of the ourre implies
- ^n 9 " * e< V •
assuming the intervals between readings the same, aay a seconds,
the response y for two different time constants m^a - In 2 and
aua « In 10 are snovm in Jlgure 3.
Hie larger the time constant, the acre the lag in
response of y(t), but the smoother the curve, Jhis may be
aeon another way: the o to y system is equivalent to an 3,
L circuit with position of 3hafts analogous to voltage as shown
In ifigure 4. with M small y follows e closely including the
a
irregularities, ./lth <g large y(t) is smooth compared to e but
lags considerably.
Movement of the hand wheel does not affeot y(t) but
shifts zltj up or down with respect to y. If the operator
turns the uheel to give equal positive and negative movements
of the dial, it may be seen that in the "steady state" (say
with f(t) - at) there is a constant lag even when the damping
is low and the interpolation nearly linear. In this case the
system bridges linearly between the raid-ordinates of the steps,
while actually it should bridge between the points ( tn ♦ 0}.
<ith higher damping the shape becomes worse but the interpolated
exponentials are nearer to the true curve most of the time. *e
3hall find a formula for the best time constant of the system
under the following assumptions
1. That the "best" time constant is the one making the
actual error least in the mean square sense.
2. That we may take as the true curve, so far as our
knowledge goes, the linear Interpolation between
the points tQ + 0. This may be justified by the
faot that the device cannot in any way perform
higher order interpolation - the curve y(t) is con-
vex upward whenever e(t) inoreased in its last step
over the final value of y from the preceding step,
and this is quite independent of the curvature of
a(t).
3. That the system is In a "steady state", that is,
that in the step under consideration y(t) ends at
the aajaa distance below e(t) as it was Just before
the step.
4. riiat the steps come at approximately equal inter-
vals or a seconds.
An interval under these conditions is shown in
Figure 5. Here we assumed that the hand wheel was turned to
give a ratio of -2_ as deflection of the dial just after to
just before a step.
.v'e have
-mt
y - A e
with
ylo) - b - y(a)
A - b • a e"
Hence
1 - e
b a~mt
7 "
also
l-e
s - y - y(o) +c
- 1 - <3"BA
- o — s— + c
-am
l-e
The Integral of the squared error per second is then
-2 1
- b
i -mt .
1 - e_aa a
dt
- 8 -
k u2 SJL- in * i e-^ !
1 - e
- 2
1 - e-D L2
1
a
k2 ♦
3 u^rs(1- ,+-t^j
+ k -
3 k L
1 - ^
1 - e~D)
D )
l-0-D [2 (D d£)
1
a
& ♦* ♦ 2 ♦ i (2 ♦ 4k) * D ♦ 3 + 5e'D
13 } 2 ^ ll--D)2 20 (1 . e-D)
It i3 evident from physical considerations that the minima of
this expression ooours fop a fairly large D. In faot the error
ourve was plotted for k - .5 (Figure 6) and the alnUBaa ia seen
to be at about 7 or 8. ,<ith D this large the abOTe expres-
sion ia very nearly equal to
- 7 -
sinoe e"D is very small. To locate the minimum we have
2* - jL - 2D (2 + 3k ) - 2 f ( 2 ♦ 4k ) 3 + 3] . Q
D2 D3 4 D2
16 - 8k) D - 16
8
whence
3 - 4k
7or k - •*
2
D - 8
Since the m**Hw«» is so flat (Figure 6) this formula is cer-
tainly close enough. However a second approximation may he
found as follows: for x small — - — - 1 + x. Using this in
1 - x
the exaot expression to eliminate the denominators we get as a
second approximation
2e'D)
- tl*k) U+e"D) - J5 llWD) - ± (l*e-3) e"3
J
- a -
£5 - 0 « - 8 ♦ (3- 4k) D + [6D (D*l) * 2D3 lk-1)] e~D+ 6D (D+l)
Using the first approximation to obtain the value s involving
exponentials, a better value may be obtained. Jor k - | the
second approximation ia D - 8.03. The first and second approxi-
mations are plotted in Figure 7.
tfith k - -| the ourve x<t) is plotted for an interval
with the "best" D, in Figure 8. It will be noted that the
ourve is highly damped in comparison to the time between read-
ings. The HIE error is then equal to
It is interesting to oompare this with the HIE errors obtained
under other conditions. If the devise is not used at all, but
a direct coupling made between the input and output, the HIE
error between the step function and the linear interpolation
between points tjj + 0 is
(I)2 . 1
CS) a
t 2
[0 - (- ^) ] dt
I m 1 m .577
b " y-sr " ' a
so that the RLE error has been reduced to 40$ of this value.
In Figure 9, the output of the smoothing mechanism,
x(t), is plotted for a certain forcing function e(t), using
the "best" value of m. It may appear that the output 1b still
far from 3000th, and this is in a sense true, but it must be
remembered that the variations in e(t) are here greatly ex-
aggerated over what would be expected in practice.
Finally it should be pointed out that a very mater-
ial improvement in operation could be obtained if the opera-
tor were trained to turn the handwneel to obtain a ratio 2
b
nearer to zero than This, however, would probably be im-
2
practical.
DIAL
< f »
C SM
C08R iCTl^O-
H AMX> WHEEL
C[0
t.
F.*t 2.
H I nmOM
DO
■
SOME EXPERIMENTAL RESULTS
OH TEE DEFLECTION MECHANISM
Claude E. Shannon
June 26, 1941
Some Experimental Results on the Deflection Mechanism
In a previous report, "A Study of the Deflection Mechanism and Some
Results on Rate Finders," a mathematical study mis made of a new type of
defleotion mechanism. The present paper is a further study of this de-
rice and a report on same experimental results obtained on the M.I.T.
differential analyser.
For oonvenienoe in reference, the schematic diagram of the machine
is repeated in Fig. 1. In the report mentioned, the utility of the
middle part of the device -was questioned. This arose from a misunder-
standing of the basic assumptions underlying the design and was oleared
up in a conference with Dr. Tappert. The writer's analysis was under
the assumption that the mechanism was designed to find rates for linear
forcing functions only (i.e., that higher order terms were small by com-
parison) , and the analysis is still valid if this is true. However, in
practice, it appears necessary to assume higher order forcing functions
and the deflection mechanism is designed to give the oorreot steady state
rate (exoept for the non-linearity of the sine gear) for an arbitrary
quadratio foroing function. Actually' the middle part (often referred to
hereafter as the "x" part) of the devioe is certainly well worth while,
as will be seen from some of our experimental curves.
If a linear mechanism has a transfer admittance T(ja) from input
e(t) to output 4(t) then
J" Q(J«>) - T(»E(juj)
where E and Q are the transforms of e and q. It is easily seen from
transform theory that if e(t) » at ♦ b, a necessary and sufficient condi-
tion that 4(t)->a a8 t-^>- is that
ǥ>-ȣ jo
If this condition is satisfied the system may be called a first order
rate finder — after the transient has died out, the output is the deriva-
tive of the input whenever latter is linear. Similarly if
00
T(O) - 0 Y'(O) - j T(0) - 0 k - 2, 5, ... , n
we have an nth order rata finder — in the steady state it finds the rate
of an nth degree polynomial forcing function. In the deflection mechanism
we have a second order rate finder
sj-
- + e^w3 + CgW* ♦ ...
if we assume / ■ nearly 1. A oircuit for solving
A ♦ 42
i - sin"1 4
under the same approximation, to the nth order is shown in Fig. 2. The
admittance here is approximately
1 # a1(» ♦ a2(»2 ♦ ... + Vl(j<u)n+1 ^
the values of the constants in the mechanism are
1 » 4.63 J"»
y(» x S **oa r * J"
1 ♦ 4.63 5.73 (j-r ♦ 1.094 (»S
_ (1 ♦ 4.63 .1«Qj«rf
In the previous report it was pointed out that due to a clutch and
stop on the input to the sine gear values of q" -were limited to two hori-
zontal lines (see Pig. 6 in that report). There is also a olutoh and
stop on the displacement of the lower integrator. This effectively fur-
ther limits solutions to a parallelogram ai shown in Pig. 3. Actually
the limitation is fictitious — the q shaft oan turn an unlimited amount,
but when this stop is in effect the stability point moves at such a speed
as to be equivalent to q and \ moving along one side of the parallelogram.
Thus if we keep the stable point stationary paths of representative solu-
tions will be as indioated in Pig. 3.
The trial solutions taken on the differential analyser may be classi-
fied as follows «
2
I. Solutions taken -with the mechanism as designed.
A. 8imple analytic forcing functions.
1. e(t) - a
2. e(t) ■ at t b
3. e(t) » at ♦ Vt ♦ o
4. e(t) - at3 + fct2 + ot ♦ d
B. Response for 8 -typical target courses, the target vector
Telocity constant.
C. The response to some error functions superposed on typical
courses.
D. An attempt to get backlash oscillation.
II. Approximately the come program although less extensively with the
middle part eliminated*
III. A few runs with typioal courses using three different third order
rate finders.
The constants of the target courses used nere as follows (see Fig. 4) i
Course I S - 150 yds/seo » 507 mi/hr
O
7 « 2,000 yds
h^ - 1,000 yds
$ m 0°
Course II 8 • 150 yds/seo
g
2,000 yd.
h^ - 500 yds
* "0
Course III 8 - 150 yds/seo
8
V - 4,000 yds
ha • 1,000 yds
• - 0
3
Course IT
S - 150
V - 2,000
h - 2,000
in
0 - 0
Course Y
Course VI
S - 150
S
V - 4,000
in
h - 4,000
in
9 - - 14.96°
V - 4,000 - 40 t
S„ - 150
V - 2,000
m
h - 1-000
M
* - - 14.96°
V - 2,000 - 40 t
Course VII
B - 96.6
e
V - 3,000
hn - 1.000
6 - - 60°
V - 3,000 - 115 t
Course VIII 8-150
g
V - 4,000
hm - 500
• • 0
The distribution of these courses is indicated in Fig. 5, together
with the approximate maximum range of the 3B A. A, gun (21 sec. fuse setting).
The actual input to the deflection meohanism is
r* s h t
a o p
but since it was desired to compare the actual output with the true
deflection
sin"1 i
the quantity e was plotted against t and integrated to provide the input.
To calculate I the following method was found to be the simplest. We have
8 h t
' --P **-
o p
A computation schedule was set up based on this formula, working baok-
wards from the time of burst t + t to the present time
P
I II III
(assumed)
t ♦ t h V
P P p
" h/l*£8g(t*tp)J2 - yi- (ftp)Sgtan *]
IV T VI VII
*p t / 78— IT
from - I - TV
ballistic
curves
The ballistic data used in getting t (IV) was read from the chart
Fig, 24 Opposite p. 59), Coast Artillery Field Manual, FM 4-110. The
value of tp was merely read off corresponding to the computed values of
r and h .
P P
If we assume as an approximation that the shell velocity is oonstant,
k yds/seo (i.e., that the equi-time of flight curves in the ohart are
circles) so that with V constant
, 2.2 .2 „2
k t « h + V
P P
h - h + S (t+t ) '
p m gv p'
p m
h/h" ♦ S t2
we oan eliminate tp and hp from the system to obtain the following equation
between e and tt
o
e2[k2(hm*Sgt)2(h^2)- (h2*S^)V2S2]
+ *[2 vsWhfVTt2] - C^5T2*TT2(h *ts )2] - o
g m n g ' 1 g m g m* m g'J
Evidently the same curve a (t) is obtained if h and S are both multi-
o m g
plied by the same constant.
The differential analyeer set-up used is shown in Pig. 6. An attempt
was made to generate the sine function with two integrators solving
but this was found impractical because of the large integrator loading
necessary, and an input table was used instead. Even in this case it was
necessary to use a very large scale factor on the independent variable
shaft due to the small integrating factors (l/S2) of the differential
analyzer as nompared to the ball type (about 1 under comparable condi-
tions). ,This resulted in solutions which represented, actually, 30 sec-
onds requiring 30 minutes of maohine time.
The equations of the deflection mechanism are
9 i * .54 x - .54 |
♦ 4.700 q ♦ 1.692 q - 1.692 e ♦ 4.700 x
1 1-4
It was neoessary to approximate the ooeffioients with available gear
ratios on the differential analyrer. Fortunately some very close approxi-
mations were found. The equations actually set on the machine were
6
7t?
* ♦ .54 :X - .54 i
♦ 4.706 $ ♦ 1.694 q - 1.694 e + 4.706 x
The error is of the sane order as the expected machine error.
Except for runs In group ID the. machine was made as "tight" as pos-
sible, the backlash being corrected by frontlash units. Due to the large
scale factors used and the high inherent precision of the integrators used
in the differential analyeer, the rune ray be expected to be more accurate
than the actual deflection mechanism.
Solutions were taken in the form of both curves and counter readings.
The ourves given here -were reproduced by pantograph to ordinary graph
paper size. Curves not directly drawn by the machine and numerioal values
quoted are taken from the counter printings, which give an additional
decimal plaoe not readable from the ourves.
Discussion of Runs
Host of the curves are given with 4 as dependent variable. To esti-
mate the error in yards for a given error in q from e, the ohart of Fig, 6A
may be used. This is computed from the approximate formula
r cos t IS
. r££L* Aq - r A(e,q) Aq
/l-F
For rough comparisons the coefficient A may be taken as 1, the error then
being the 4 error multiplied by the predicted range.
The first set of runs taken were with a sudden impulse e - kl with
the system at rest, both with and without the middle part of the meohanism.
Runs were taken with
k - 0.1, 0.2, 0.4, 1.0, 2.0
Typloal curves are shown in Figs. 7 and 8. The results are very close to
computed ourves on the assumption that l/f/l*^ ■ 1 when k < .4, but above
this the non-linearity becomes appreciable. In the worst cases the
sient disappeared to within machine errors in 25 seconds, and for most
oases within 8 to 12 seconds. The action with the middle part out was
7
considerably more rapid than -with it in, the transient being 6 tines as
great, as had been predicted, this being a special case of a linear
forcing function. Pig. 9 is a -lot of the time required for the transient
in 4 to reduce to 2/10 of its maximum value. For values of k greater
than about .35 the curves cross the axis once with the middle part in.
The curves with it out are all" identical with k > 2, due to the action
of the slip clutch on one integrator.
-
Next a series of runs were taken
e - ktl(t)
starting from rest, with
sin""T: - steady state S - 15°, 30°, 45°, 60°, 75°, 60. G°
the last being the limit of the sine gear, the maximum possible deflection.
These runs are shown in Figs. 10 and 11. The transient died out in all
cases within 20 seconds except with x in for S > 75° in which oases 30
seoonds or more was required, due to the action of the slip clutch. These
long transients, however, would probably not be troublesome since such
large deflections would only ocour in practice with the plane almost di-
rectly overhead. For the smaller values the response is about equally
rapid with x in or out.
Quadratl o Forolng Functions
— — — — 1
The runs with a quadratic forcing function
e - at2
were the first to show the superiority of the mechanism with x in. Runs
were taken with
a - .01, .02, .03, .04, .10
With a quadratic rate finder the solution q" should approach 2 at, and with
x in this was very nearly true, the discrepancy being due to the sine gear.
8ome solutions are shown in Figs. 12, 13, and 14. The errors increase with
a and with \. The maximum slope found in air/ of the I courses plotted is
about equivalent to an a of .05 so that the large errors due to the sine-
gear with a - .10 need not cause great concern.
8
Cubio Forcing Functl ong
For oubic forcing functions the following were used
•± - -.04 t3 ♦ .1 t2
e2 - -.001 t3 ♦ ,05 t2
e3 - -.0002 t3 ♦ .02 t2
.These -were chosen as having second order tangenoy at t - 0 so that the
transient is small. The results are shown in Figs. 15 and 16. The re-
sponse with e2 and especially e3 are very olose to the calculated values
on assuming the equation linear. The error in e^ is somewhat greater as
in the quadratic case with higher acceleration.
Effect of Backlash
— — — — '
A number of runs were made to determine the effect of backlash using
several different foroing functions. In order to inorease the amount of
backlash, frontlash units were inserted at several oritioal points in the
baokwards direction. The results of these runs were, however, oompletely
negative, for no oscillation of any sort was discovered. The system was
given "shocks" by sudden turning of the e shaft and other methods, but the
solutions were oompletely stable The only results were small consistent
errors, of the order of magnitude of the backlash. It is possible that
due to the large soale factors used in the set up, even the artifiofelly
introduced baoklash was not sufficient to oause the oseillatlon effect.
Response for Typical Courses
The response for the 8 oourses described above are shown in Figs. 17
to 24. It may be noted that even on the flat oourses (e.g., IV) the opera-
tion is poor without x. On the flat oourses the response is satisfactory
with x, the error being less than 20 yards except sometimes at the hump in
e. However for the steeper courses errors of 60 or more yards are common
after the start of the peak which do not disappear until nearly the end of
the oourse. The action is particularly bad coming down the hump. Fig. 25
is a plot of the error in yards with oourse VIII, x in.
9
Response to Error Functions
In Pigs. 26 - 28 are shown the responses to some random error func-
tions of various kinds superimposed on courses I and II. The operation
in damping out the error is considerably better with x out. However it
seems from a consideration of the size of the errors introduced and the
responses found that the system, even with x in, damps the errors more
than necessary. That is, it might be preferable to increase the speed of
response so as to reduce the transient errors in the solutions.
Pigs. 29 and 30 show the responses when we suddenly start tracking a
target in courses I or II with the machine previously at rest, with the
target at several points along the course.
Tests with Different Equations
Three runs were made on course VIII, the most difficult one of the :
group, using three different cubic rate finding equations. The equations
used were (assuming linearity) critically damped, with the transfer
admittance st
[i ♦ 2(>)r
2
(2) 4 . 1 * 4(j«fr ♦ 6(J.)
[i ♦ (J-)]4
The results of these runs are shown in Pigs. 31, 32, and 33 and
should be compared with Pig. 24. Of oourse, this gain is accompanied with .
a loss in error function damping. With the^roots equal to 2 the system
had a slight tendency to be unstable on the flat part of the oourse. This
however appeared to be due to the "human backlash" in the operator on the
sine table and would probably not be present with a sine gear.
It is easily seen that an increase in the values of the characteristic
roots of the equation demands a proportional increase in the power require-
ments of the integrators. It may be that this will be a design limit in
the case of meohanioal systems. Ho difficulty would be experienced here
however with electrical integrators.
10
The main conclusions of this work are as follows:
1. The middle part of the machine is definitely worth while.
Although it increases response for accidental following errors, the gain
in behavior for actual courses more than offsets this disadvantage.
2. The system behaves nearly enough like the linear system
1.094 "q ♦ 5.73 q ♦ 4.63 q ♦ q - 4.63 I * 4.63 e
to within a few per cent,
ction of 37°, the approxi-
that this may be used to calculate its
providing q < .6. As this corresponds to a
mation is sufficient for most eases.
3. For targets whose elevation at their nearest point is greater than
about 50° fairly large errors occur due to substantial cubic and higher
degree terms in e. This indioates that it might be worth while to use a
higher order rate finder. Tests made with a oubio rate finder showed
greatly improved results.
4. If the additional cost of another integrator and adder required
for cubic rate finding iB too great to be Justified it appears that the
system oould be improved by reduoing the time constants, for if sufficient
power is available from the integrators, the only disadvantage would be
increased response to random error functions and our results indioate that
they are now damped out more than neoessary.
5. There is some indioation that better results would be obtained
by making the three time constants equal, or more nearly equal than they
are now, although this is not certain.
11
mr— < mum, tmmm l-.-jgni —
inS^^B^^ESS — — %5S55 immmm tw
■■■■■■■■■■■■■I
S3
IMBttS HIMlUHmMUMilMN
wmmmwmmmmmwmmmmmmmm^wmmmmmmmm
mmmmmmmwmmmmmmmmmmsr
□
^H^^^ igOiffililllfin imlUlIl iOtliiinflmiiiii iioio|i| Illy gnl gm^
■■■■■HHi
•IZI !!*••&•»■«
■IM ««••■■■••■ ••««■••••• •■•■■•••«*
■apt •«»••■■•■• aMsavaaas
mmmt Imu Man Miii mMini
iaaf »fj»8 ■ ■
IIIUilMUMIt*
— -■■■■»«
!!■■■*
■■■■■i
iftai iMNMIitMin
ilOasHS:
aaaaaaaaiiai
aa uuiiiiii
^^JiiiliiliillliiiHli
BBSS
»SUua
IIIIIMM
itS"
SSSli
iig^iiiiiffliiliElili^IBt^lili
piiipipillliiiiiPiill!
••■•it
jyyjlgHOjnllL
MSMMMMmiMNffMNNIflMI
MiZa 55555 iitH am M"j
■ESS ScSS Bwn mvm nvuvv
toHBS Sasui :::::::=: 2K:r
p^^g|gliPpillipigii
sasBS.,
laMiyllillRSiyiio
■■■HMIBH|iliHi;s:
HHiHiniiiHHH
liiiHtan.!*' ■ tmmmf »«««»»««»»»«««« »«»»» lllli HIS ■«»» ■»■»« *** Sii f?=T=-—
i--^— :rt~;::
••■"■■•••■■•B.BBII.IIIIBB.II. ■■■■■■■■■■•I* Jl I • • ■ .
::::: ■■■
!!!!ai1111 Iaaai 1 Hiaa>l »■•!•■■■* ■■■■■ " hi "!! !
■■•■■■■■•■•■■■a
.IB. ..III! ai'BIBII
BBBaiBIBBIII ■■■■**
■■■■■■■■■■■■■■■■■a it ■■■■<- -
■*■■■■■■ wmw*
••••• urn • •••2222222 21222 222*. 2222! 22"..
■ ■»" bbim Miiaiiaisami. ■■■■■■■■■■ ■■■■■■■«■■
«•«■•••«• riiniiiifMiiniMiii *iimiim«(IimimimmSm!!!m
:::::::::::::::::::::::::
:::::::::::::::::::::::::i:
■■■■■ ■■■■■■■■■■ ama ■ ■•■« ■
• •■•-■■■> awi aauiiMMiMaa ibm. .£2 ZZ 22222
bbiibbbbm ibbbb um imi mn ■■■■■■■■■■ •■■«..■■■..«■.. .... ■
•••.■•••••••■•.•■.••.•••■.•••.a.... ■•••■•■••« ■....«■!■• !•»•••■!•*
iiiiiiiniii um miiiiiiiimi iMtiiiiiiifniiiiifMiiiiii
lUMluniMttilu ••■iiiiiMtiiinnimni ...
■■■■■■■■■■iMMiiiuiimiHiiiuii ..........
■»■■■ iiniitMiiniiiwiiMmMHimmmiimimiiHiMmnnmiiin
...ii ■.. ■■■»..•■•• mi inn in
Mill ■■■■■ M1M ■.M.^.-.W W _ ^lOTHMIUaa. •••■■■»■• BUM ■■•■■■■•■■..«..
■•••■•••■1 III
'■■■■■■■■■■■■■•.I ...
■*;■?«■■•■■■•» --«■■■
■ .in...... bk mmm
■iiiaa
•imi
• ■■■■t ■■■■*■■ BBBasiiiL-t
:::::::::•::::::
'\:::::::::::::u:::::::::::i::::^:::::::::::::::::::::::::::::: ::::::
..i**;;" -•»••»•■■•••»■•■•••• ••••••
■■ miiin
222222222! !22*i~r*** "t- ..»•».■•.•■••..■•.••
II2I! II22! ?*;■! f^£i. ■■•■■■■■•■•■•■■••■■■ *«b •* . • ■■• ■.*•..■•«■
■■■■■■MB.
PBBM ■•■»* «..■■ BIBMBBBBI IBBBB IB
■ .BBSS
■■■*■■
■■■■■ •■■■■■■■■a SSSSSSi
22222:22222222s ■••■•■■•"•^^•'^■« •■-■«■••■■■■■••••••■
2222! 22222 2222! 12222 252" Ik^-lkMIIUIIIIIIIIIIIIIiiiuUIIIIIIIIHIIIill
2222! 222222222! 22222 !22±f i-iisis^*"* -«m..i»..... .......
!222!!!22!!222!2222!2222! £2222 222^* --•*»*■ ■•■«■••■■.
■ ■•«■•.. ...a... -is. ii
2222 2222222222 2222222222 !222!2M2!*^*:
■■■■■u inrni ■■■■
bbm bbbbb ... bbbbb .ihii ami .. ... .... . .u
llll»IHIIIIIIIIIIII|l|llllllllHUIIIIUIIflllllllllll
HiiiiHiiiiiimii iitttiiiiu iiiiimitiia ii iiiiiii
iMiiiiii •■■■•aaaaa aaiiaii
■■■■ BIIBB milMHI 1.M1 ■■■■■ IIIIB milHII|«M||||
■yi ■■■■■ itwi mw ■■■■■ ■■■■■■■■■■ ■■iiiiwi ■ win
■■■imii aaaaa.aaa. iiihiiiiii»iiiiiii ■■■■■•■•■i ainaai-
■Mf IHII ■■■ ■■■■■■■ Hniiiin ■■■■■ mil ■■■■■■■■■■ immiii
■■■•■••••■■•■••■■••■•■■•■■MtlmiNMI ■«•■..«.■.
IZ1I22 222222222! ••■■»•■•■■•■■■•■•
222? 222SS2222! 22*2* ^■■■■^■■■■■■•■■■■■■■•■■■■■■■■■••.>aB»aaaaiiiaaaa.a...
222S 22222 22222 22222 ^2 * .■»■. n. umii—
■ BUM I
ass:
• ••Si aaaiaasaa. miiitiii
■■■■■iiniiuMMUiMm
2222 2222222222 22222 25222 22222 SlSnSSSSS IZZZZ ZVZ* ZZZmZZmZZZ
lliilzlilllll^
lljMIIIIII MUlllllllttMMI
■■■■■■■■■a ■■■■■ aa
8B858g— ■ ■■■■■mwWMHiii— ■■■■■■ ■■■iiiiaia
N flllMIIIIIIIMIMMIIIU Mlllllllil m
miiim 2*J22 22 212 .2222 22222" 2
! 222222 2222222222 22222222i!**"*"w*
2222 22222 222222222222222 222222222! •
;a;i!»;*;!!a*!;?!M'g!f!jiM*»»i*MiiiittitiiitM»*iiiiitiiiMM«iiMMii»
222! ?'"*"aai"Vaa*"a***l*MW,m'l>'>l iiiMHin— niiaii
■ BvaaaaBiaa aaaieaaaa
■ aaama mim t . mmu
■ I ••tlMIIIIIIIIIII | .III..:
mm ■■■■■■■■■■ tun
■ WlMBil
its:
Hiiiuimi iiNiiiaiiiiu ■■««■■■■»■ ESaaaBaaaiiiaiiii
iiiuh mn inii Hiniiui inmiHi ■■■■■■■■■■■■■■■t
■ HIIIIIMiH
'•»« IIIIIIIIHIIIHI
■■in inn aaaai mhjiiiiim .......... ..
• aaa miiiiiHiiini.
■ ■■■•■■■■■■■■I IIIHU
■■■■■■■■aiiiaaiiii
■■■a ■■■■■■■■■a Bin
■■■a ihii bbim iibj
■ ■a. iiiiinmiimiaMB mmm\
miiHin ■■■
■•■Niiinmu mniiMi
•«•■■ aia.iaaava iiiibmim a .bib
*■■*■•«■■■ ■■■■■•■»l ■•■■IIMHiaHIIIIIII Hilllllll
■ibbb uaifl »■»•■ ■ a. umimiiiiimni imiiiiiii
■mm aaan mm mmm ■■■■■
bb aaa ■ aaa ■ ai IM ima
iMiiniiiiiHimiinniimmiiiiiiiiimmiiuiu ■■■■iimhhimihii
2222 222222222! :22K22*"*2"""**"*""*"""*"**"^
222! 12222222m 22222 22212 221222222222222 2222221222 2222222S22 **;'****;;;
hi m
I Embbmm (SiiiHiiiMHiiinj
1!
iiU
llllsiiltalll
br:
iigii§^iiiii!yoriHniiiiiLiiyyiii§iii|
sh s=s nca sr ■• rr: xsa ssisn rrsrrs rr: b? r brsrrs •?•■? ■•:■! am k:k tnsRSSiaRSBif
IIrrrr:
rrs nss rt^ r r: xsa rrs » rr rrs rrj rr: Br: £r::::rr:::r:r:r::r:::
rr: ass b^j r= rr: rr: rrrrr r:r :rr R«. • • . « .«•• •••** ■•«• krcrrj
lliisfeltil
rrjbrsrrirr:
RRIRRtB^ BTWRR3 BR3RR3<RRB:a BR;nm ^
■••••if
HRSIBR3
IE
jmw^mmmmwwmmwmmm
mmmmmmmmmmmmmmmwAM
:rrr
ISS!
'rr:
S!BB
BR
iitSS:
rrb:
srUIIIH
hrJI
m
it***
iJliib
HHHHH
krr:
:brk
■fiuilliiiiiiuiflH«iiia»ift*fliMM*ai*w« I
■ ■iHiiiiiiiifimMiffumiiiuuMtMiM I
B"~" !HBM,HM'!lMW*>M*"l"t>w"Mwi :
"!!l!''*!"*,,>"!!mui*>*,'!*MPia> I
■uiatMBMUiflisiMUiiMMiiBMaMiaai I
— MM**!
mill
lilt.
■«■■««»
_ IUIIUI.. .
SKKSS
::::: :k:sj:
HSU
ksbkj
■ »*■■«*■■ mm a asm ■
■ Himwii >Maa»
tSBaei:
~ MM WWW
assi
alaaat^jaaaaa
[■■■■■■■■■■■■■^■■mw
|i«ntifiiiii>r .■•■■■■•■■■■■■■■■■••t ■»•■•**»••
I »yMMiiiiniini»iiu
I ■miuiuiiir|»VHU mUHmuiai aaaaM
::::::::::::t»::w.:: :::::::E:::::u:Rn:::::nn:K:::us:i
I r ::::::::::::::;::::!::::::;:::::::; :uk:
■ MIS UU
■ M»*« ••■>•■
aasasi
=7 — -^aVBltflt'M §? Aai —
;;:;t;::;;ntaiie;h;t;«:»«aw!i
• b ha ■ * »
■•■■■•■•■■■■■■■•■■■■•■a
liiiiiiiiiuiiitiiiiiii
"IIIHI»lltlllU«IIHIIinMllinHlllMaHHMIIffllllll*»l>ill KllKll
■IIIIIMIIMIIMItllHIM*
lllllllll"lllllllMIIIIIIIIIIIIIIIIIRIIII(lU»llt*llllt.««M<«lll*f»<lllMI«lllll*
■ •«>(■■-«•••. ■■■■«■•■> «>■■■■•■■«•■••■>■«■■■■>«■ ■■■■ ......... .*•■..•«.»•.». ......
lHi>li*i«iiUMiiniiiH«iiiiiiiiitiiiiiniiHii«iiiiiiii« >it(Muii«i«»l.aii(i«if»iiiMiimiiii>'
!■■■ ££!5!5ffff> SfSSSflSSfHSS!! ■•■55 ■•••■••••••••ft *■*•!*»»•••••« »••>»*•*
■ •■■«
« » « • « ■* •
■ ■*»» » ">*•••■■*«>■■■• *ar *■ ■>•..••■■
* * t«. *»»•••,■*
..«* -t L. * IlltllXI
MKftiiiitiiiiiriiiitii . »« *;--•>•«•««.
■ »•* ■■■■■■»*•#««» IIIIIIIi^MI: •l]l..|t|l«'c||i|tt||M|t,
■*»*■••■•**»»■••••§•*«■• * «•■><■»•.•*•*■■-
iilliiiininfiiiiiiiiiiiiiiiii«iiiiiiiiiimHiiai(iifi«ii*n i«iiii«HucMiiiiM«<iiiiMi«uiii>
lUllllllllillllllll ■■■»■■■■■>•■■■■■•■•■ ■■■■•«■■•■•••»•■»>•' •«f ••••■••••■*« -AM. 3 Ittirtlllll.tUtfl .
IIUIIIIIIIIItlllllllllllilllMinil^KIUMUIKIIMIDfltMMUMMOt.l.^Mt.l.lllltlllWtllX- s <
>«i>> ■>■*>>>• >•■>••»>..>•-■•■•*«•-•■•■•..••••••■■.■>•*•. a^.x ...,....««.«•.». .».» ...... ........ ...
■Mil**"
iuiiiii
>*»■>■••..• i «»«. * ««.
I ■■■•«■■■•■ ■«■••■■■■■■■■■■*»■■■»■■■■■■■« «•■■■»■■■ IBS'
■■«••■•■ •
■ ■>■■■■»)■«■■«■ ■itiillilit aa*Ba
ilimifiuiimii
■ ■■■
Hi
_. JBBBMBB *■*■■■•* « a «
i *•>«•■••«■■■•* ■■Miiiari
■* MlilllllliflMMIt
■ »*■ iffititiiiiaitimtiii •■iiaidiintKiiiiiii ■
■•■■•■■»•■ i»iiiiijfi>iiiiiiiiiiimm«iiitiiiitt< I
■•■■■■«••» ' ■•■•■IIIIIIIIIiaillllUIIIIMIIllllllll itMiMtim (laiiMlffiliitKilikM) I
_ JniMIMimilHIIUItlflimiltlMRRIIt'iiakitlflltMll.fflllMIHMIimif
• mm ■ «f k . r ........ .) r>(
• •■laiMaiMii'i j - v-j
lltlNMIKKIMtiU '
Criteria for CcnaUtecoy and uniquenee* la R«lay circuit!
[>}
September ft, 1M1
Zb ft ayatea of linear algebraic equation*, thara
ara tfcree poaaibla type* of de«eu*rnoy, n&aely lneonaiateaey
(no poaaibla aolntioa), assblguity (solution* not uniquely
determined) and redundancy (aura equation* than neeeeaarr) •
Scoe**ary and auffioiont condition* ara known for the a*
types of degeneracy in tcra* of the rank* of mm coefficient
and augmented satrioea. Soaewfcat elailar af facta can occur
in tna boolean equation* characterising relay oircuita, gir»
ins riaa respectively to chattering aaoiguity of relay pool-
tioa for certain value of the independent variable a, and reduad-
^UaVCJJ^ ^Je?^ HJ^avdsVj^JJ ^^e^? ^M9&aat'^^^aV^jtfca^^ 3ha^fc ^*1b^*J**^J e^H^c*1^*^ Jpas\J?^fce^ca^^n> ^H^^L^fc^Ht^^LJfc ^cTiij^^^a,
W« aattM i aihmA fjM» thft«> mnnA I tlrtna Im t— mm f»f a a ilMKltt
ae^a? ^s*es> ^*^acaa»>ea>*aaa^pa» *> wcT Waaler i*^i*» ^p^peiwn ek vavatv aa^ai w^ses, a* ^e^w^a w
dlacrlainant 7.
Consider a relay circuit containing •** relay a
*X> «gf •••• Hake and break a oat cot a oa ^ are dealg-
aated aA aad *J, and we auppoca that thara are a independent
variable a1, e^, •»•, e^, which do not depend oa the relay
poaitlona. 0uah a circuit la equivalent to the circuit of
Fi*. 1 in which
*i *B* **** *** *i» *#,• •••
la the Boolean function which la aero when the awitchee
*»ft MitMti a^, ere la eucfc position* that the volt-
«M wro» la the original circuit la *uf r icloot to oper-
ete It ana oh otherwise. The fenetloa
B
i-x
will be •till* the oirauit ai«cri*ta*nt. *e alee define the
following it mm* a eteadr etate la a relay circuit corres-
ponding to a given aat of veluee of the laaepeaaoat variables
Ais a act of poaltloaa P.. ?«. JLrtao
relaye oath that If tao iadepeodeat variabice ere given
tao valuee A^, end tao ralaye held la tao position
Tt> ««»• Pa lea* enough for tao eteadr atato fluxee la tao
00U0 to build *», the relays will remain la tao aaao poal-
tloaa ladefinBtely,
a oeapletelr •oolUatoay oteto at a relay elreult
la a aot of valaoa Mg% A,, „#f of the independent variables,
each that ao natter what tao Initial yoaltloae of tao relays,
or how long they are held la that position, ansa they ara re-
leesed at least oao aakeo aa laflalto auaeer of eeoUlatloas,
I.e. ehattare. Xa addition to theee obviously exclusive pocei-
hUitles a alrealt nay be •partially* oscillatory for eertela
Y*lu*i of th« loft«j>emaoftt rarioblos- with mm iaitUl oonCi
tiooo th« •Ircuit oh&tt«r* and with otters roiftpooo ioto o
•toot? ototo. Ao oxonpla U oho** im Figure a wtero with
too ioltiol OOO&MOO
ax • 0 (o9»i»to4)
tho oireuit «h*ttero while with
tho oireuit rei&peee into tte eteefijr ototo • 1, Rg * 1
fttSBBI I • *°* *i§ *••* *£• *M t* »e o otooA/
ototo It is oeeeoeerjr eoft ouffloleot toot
This lo aeoeoeejy eiooe lo o otoo^jr ototo too oeotooto of
■ ■
relay «1o41o#i
or
%•.%•»
to toot
o-ai^ol^-t «*eo • Wv mt • A^
Xt la sufficient sines
so tt*t if tii* relays are hsld is these positions ?A long
enough fear fluxes to build up they will remain there*
■
Theorem II • For .... to be completely oscillatory
it is necessary end sufficient that
t C*^t a^i «^» •••• a^) • l
identically la the This la accessary sines other-
wiss there Is a sst of a^, say 9^ such that * * 0 and
this Is a steady stats by Theorsm X, It la sufficient
alas* If true thsa with any starting position say
9V •»*, Fa at least one tern of ths sua (1) say *t • n^
la equal to one. aa that
snd one or ths other ana to • hence, After sons relay has
shangsa «a still boys ths sans aitaatloa sines f - 1 so
that at lsaat one relay ashes aa infinite number af shannon
of position*
- 5 -
la »tM f Ui# A^t #♦♦» a^) is * function
•f tfat (ait idontioalir ©at or n«ro) too oyste* h»»
•om nt«aay »tata« aawoly tat roots of f « 0, Out for
arbitrary starting conditions w* saenot toy what the notion
will so, Khataer s elroalt eeefce out s steady state or sot
depends set only on ths artwork topologr so la Fig, 2» oat
•loo oa relay ehareoteristise as la Fig. 3. Bare If lo
olow operating ana *j wy fast the « iron it oar chatter
with both relays ialtieUy uaeps rated for ag nay new
stay la long eaoasfe to opsrsto K^. If lo fast and
Sg alow release* too systea rolapooo lata *x * 0, Rg • 1.
Boaoo no purely slgsbrais oo editions saa So sot ap to deter-
alao whether a olroait will rolapao lata a stood? otota whoa
0 la a function of s^t
© ojk ^fts^ eiSKe^sKJo^SPf
!
SvlIj 15, 1943
Gap? Ko
Bel
ON THE INTEGRATION OF TEE
BALLISTIC EQUATIONS ON THE ABERDEEN ANALYZER
by
Professor W, Feller of ErovzD. University and
Dp, 0» E» Shannon of the Bell Telephone Laboratories
AMP REPORT NO. 28.1
APPLIED MATHEMATICS PANEL
NATIONAL DEFENSE RESEARCE COMMITTEE
This is a report on Investigations made at the request
of Dp. Warren Weaver (letter of December 28, 1942). Our study
has been based partly on oral information received in Aberdeen
(January 18, 1942) and partly on the material contained in the
Report No. 319 of the Ballistic Research Laboratory ("Report
on the Differential Analyzer at Aberdeen Proving Ground" by
Major A. A. Bennett, December 1942). The technical set-up
as described in that report will in the sequel be referred to
as "present set-up". It should be clearly understood that we
were not to study possible technical improvements of the ana-
lyzer as such nor to reexamine the theory underlying the dif-
ferential equations. Accordingly, the present report is con-
cerned only with an examination of the procedure of mechanical
integration of the differential equations of ballistics as
used at present. Furthermore, we have not considered any methods
of integration other than on the differential analyzer.
Before proceeding to describe devices which might
contribute to the efficiency of the analyser we wish to summarize
some negative findings, as these may render superfluous similar
investigations by other persons.
a) We have carefully investigated a great number of
alternative set-ups, on the differential analyzer, of the dif-
ferential equations either in their present form or using
various new variables. However, we have been unable to find
any form superior to the method as used at present in Aberdeen
which, in our opinion, is the most efficient one.
b) We have studied the advisability of using some
method of successive approximations. Such methods naturally
present themselves since one should expect them to reduce the
ranges of the variables involved and thus increase the accuracy o
However, a closer study will show that it is almost invariably
necessary to subtract, on the analyzer, two large quantities
which are themselves independently obtained on the analyser.
This, of course, nullifies the desired effect of reducing the
ranges. Various possibilities have been studied and, among
fchesn, the possibility of starting with the vacuum trajectories
and integrating the difference between them and the actual
trajectories. Again we were unable to find a method which
would aopear superior to the present set-up. It will be noted,
however, that the modification of the latter suggested below,
can in some sense be interpreted as the first step in method
of successive approximations.
c) Several perturbation methods and expansions
according to various parameters have been tried paying special
attention to methods suggested in the newest Russian literature .
None of these methods seem appropriate for the analyzer «
Coming to the less negative part of this report we
remark that an adequate theory of errors of the differential
analyzer is not available at present. However, simple theoretical
considerations based on experience gathered at M.I.T. make it
appear that a very considerable part of the total error is due
iEITIDOTEl
of error are backlash and,, perhaps even bo?®, inaccuracies in
the following meehenism for- the input and vector tables . It
ssems therefore possible to achieve a gain in accuracy by P®«
dueing the range o£' the variable?? in the integrators, even
though this nay neeossitat© the introduction of new adders
and gears. $hs following r ecomsaendat ions are based on this
assusaptiO'At We proceed* step by step starting with the simplest
case.
Recomend&tions ,
1) Consider, to begin with, the horizontal displace-
to
s
sent 2:. Obviously dx/dt will range from its maximum r, at
the beginning to seine fraction of it, say qxQ, at the end*
Accordingly, when integrating in the usual form
(1) X * X dt
the integrand ranges from qzc to xQ , Now this means that
only a fraction 1 " -3 — of the total range of the integrator
disc is used even if we suppose that the goale factor has been
chosen in the best way (30 that the rim of the integrator disc
is used for values of x near x0). If, instead, we
14J_ i f * 1 . <l
(2) x - — g r xot « j(z . i-| a^Jdt ,
1 — Q "
the Integrand will range from its maximum — *o t0 lta
minimum
- 1 - a i
2 o
This allows one to use a scale factor
■s r times as large as in the set-up (1) and to utilize
1 - q
the entire integrator disc. This, of course, means a consider-
able gain.
Eow the constant
i ± q
in the integral in (2)
appears only as an Initial displacement. It is therefore seen
that the realization of the proposed set-up (2) requires, as
compared with the customary set-up (l), an additional gear (to
produce 1 t q aLt ) and an adder. The following figure shows
the simplest mechanization.
>\
s
x
14-Q .
x - 2 x0t
t
t
It goes without saying that the gear ratio does not need to
be exactly
I. +. .3 4
2
xQ • any number near the middle of the range
of the integrand will do the same services •
If used to its fullest extent, the system as described
changes a previously positive variable into one taking on also
negative values. Although only one change of sign is introduced
this will introduce some new backlash* Now, if instead of (2)
we mechanize
(S)
x - qx.t
qxQ) dt,
T
-5~
the new integrand does not change sign, and no new backlash is
introduced. On the other hand, the optimum scale factor for
(3) is only — times that for (l), that is to say half the
1 - q.
scale factor for (2). We conclude that with proper corrections
for backlash the set-up (2) should prove besto However, if
enough frontlash units are not available at Aberdeen, the set-
up (3) may be tried with advantage.
2) A similar device can obviously be used wherever
the range of the integrand does not utilize the integrator
disc to its fullest extent* This is true for almost all
integrators whose outputs are:
(i) the horizontal displacement x,
(ii) s = fv dt , v being the speed,
(iii) Q"hj , where y is the height*
In the first two cases the new set-up would not produce any
additional loading since the integrators are driven by the
independent variable-motor. In other cases an additional
loading would ensue which may have to be compensated by the
uae of a larger scale factor on the t-shaft; this would in-
directly slow down the machine. Whether this will have to be
done is impossible to predict theoretically. Should it prove
necessary, it would be for the user to decide whether the gain
in accuracy is worth the loss in speed.
3) If the above described device should prove in-
- V/ --
* - v j
?'are &i#£iuZ£ fit cbs atpens* or
f ©Hewing uspr-c-vftmca? &*t
oonaidaraMa Eaaua] #J>rk end io&s Tn4 process of
integration may bis Stopped it ecn^aivfsat wnd tlx*
dure 4-5 cie:-- <jr 'be:: ?abr»vs! fe« <'* TX'f'
intervals? C-ofttfSSeifi. f'^r wxrole •. «c? 5.afcet*iaa4! febi fs*«
indicated ite the figure *' rath as ex» si
X
\
V
Her'?, even the usual pros a dure of Integration utilises the
entire range of the integrator disc and no gain can be achieved
by Means of the device as described above ► Ee^ever£, the integrand
any conveniently be treated by a double application of this
device splitting the interval of integration into two parts »
In othsi words, insteed of e given function fix) we integrate
the difi eranee betveen fix) and a step-function. The output
of she integrator is ~,o longer P'x) * j bufc th*
difference be ere en »' x) end e triangular (or "roof*-; funesisn.
fU)
r~ — V-
V
i — s„:
7-
Similarly, with a convenient subdivision we may use any step-
function for the integrand and the corresponding polygonal
line for the integral.
This procedure obviously requires resetting the
integrator in question and changing one gear ratio each time
the machine is stopped. On the other hand, the increase of the
scale factor is roughly proportional to the number of subintervals,
4) In principle this procedure may be looked upon
as a special case of the following more general method. Instead
of
(4) v(x) = Jj dx
write
(5) w(x) + 0U) = \(y + $*) dx,
where 0(x) is an arbitrary function and 0Hx) its derivative.
In practice, of course, 0(x) should be chosen so as to render
the maximum of Jy + 0'\ as small as possible in order to in-
crease the scale factor on the integrator. Now if 0(x) is
not a linear function, the mechanization of (5) would require
two new input tables or their equivalent. However, the possi-
bility of obtaining some special 0(x) by means of non-circular
gears should not be overlooked. This would mean a considerable
RESTRICTED
-8-
improvement of the linear method.
5) We have been asked by Dp. Dederick to consider
whether it would be advantageous to generate from an
input table (instead of by integration, as at present). The
foregoing remarks contain an answer to this question. It is
not difficult to s ee that the present method of obtaining the
function by integration is more efficients It would probably
become even more so if the recommendation 2) were put into
effect.
6) Although it is in no direct connection with the
subject of this report, we enclose an Appendix describing a
simplified method for computing gear ratios. This method is
based on previous experience (of one of us) at M.I.T. and may
prove useful in connection v/ith ballistic work on the Aberdeen
Analyser .
Brown University, Providence, R.I.
and
Bell Telephone Laboratories, N.Y.
May 27, 1943.
W. Feller
C.E. Shannon
iEOTIOT
-9-
A METHOD OF DETERMINING GEAR RATIOS
•
In this appendix a simplified method of determining
gear ratios for an analyzer set up will be described which
was used for some time on the K.I.T. analyzer and proved in
general to be considerably faster and easier to change than
the original method of equalities and inequalities. The
method may be briefly outlined as follows:
1. Draw the set up with an unknown gear ratio in
each shaft of limited displacement. An unspecified
ratio is also placed in the two inputs of each adder.
2. Calculate an approximate scale factor on the
independent variable to give the expected time of
solution at the average rate at which it turns.
Choose an exact scale factor near this approximate
one which is a "round figure" in terms of obtain-
able gear ratios - i,e., factorable into a small
number of simple rationale.
3. Choose in the same way scale factors for all
shafts of limited displacement - integrator inputs
and function table inputs, and outputs - so as not
to exceed their limits with expected displacements.
4. This fixes p by division, and from the integrating
factor of the integrators, the scale factors and
gear ratios of all shafts except those containing
adders. In the case of adders the input shaft with
smallest scale factor fixes the scale factor of the
adder, the other input being geared down to the same
scale factor. The output gear in the adder is then
fixed*
5. The set up is then inspected to see that no
integrators or other parts are too heavily loadedo
If they are, reduction gears are transferred from
inputs to outputs to reduce loads when possible,
otherwise the soale factor on the independent
variable is increased.
In case the ratios come out too complicated dif-
ferent scale factors are chosen in Step 3. With a little
practice and foresight, however, it is possible to obtain
suitable ratios on the first trial.
KTTOTEO
DO
Two Hew Circuits for Alternate Pulse Counting
The well known W-Z relay circuit is shown in
Fig. 1. A is a pulsing contact which is alternately opened
and closed. Indicating closure of contacts by 0 and open-
ness toy 1 and for relays 0 for operated (up) and 1 for
unoperated (down) the circuit goes through the following
periodic cycle of operation:
A
w
z
1
1
1
0
0
1
1
0
0
0
• 1
0
1
1
1
Thus one complete cycle requires two complete pulses on A.
This note describes two apparently new circuits
which perform the same function. These are shown in Fig. 2
and Fig. 3. The operating cycles for these are:
Fig. 2 Fig. 3
A
w
z
A
f
z
1
0
1
1
1
1
0
0
0
0
0
1
1
1
0
1
0
0
0
1
1
0
1
0
These three circuits may be compared with regard
to the number of elements required as follows:
Belays Contacts Resistances
Figure 12 1 continuity, 1 transfer 2
Figure 2 2 2 continuity, 1 break 1
Figure 3 2 2 transfer, 1 make 1
In Fig. 3 the resistance is theoretically superfluous;
if the transfer elements could be trusted never to be shorted
it could be omitted, but in practice would be necessary to
avoid shorts when the relays were being adjusted. Figs. 2 and
3 are essentially duals, and 3 was obtained from 2 by the
duality theorem.
In Fig. 2 it may be noted that the two relays are
*ip-when A is closed, while in the standard circuit they are both
^jTwhen A is open. This might be desirable in some applications.
Fig. 3 has the possible disadvantage that both ends of the
pulsing contact A are connected into the circuit, while in 1
and 2 one end can be grounded.
C. £. SHANNON
Att.
. 1, 2, 3
w
CONT. 6W'
o
— O G «
W
T
A/W
z
1
AAV
w
CONT QZ
W'
-o o
CONT
—O O—
I
z
AAV
w
w
-0 3
1
W-1
TRANS. Z TRANS. — ty\A/ — " FIG. 3
-o o
z
-o o — *
A
-o o
Z'
FIG. 1
FIG. 2
tTtlT
SCALE
mm within uriimilti. int.. ifTrnr
Counting Vp or ixmn vith -ulse counters w J 1
iith binary counter* of either relay or *l»c5rsnic
type i* is ;o£sit2« by simple KKsdif icutisn u> count bo ih up end
doon. £uppose Us* largest uuaber that oaa be j w^isterec is L*
refining the ao^lisent of «aiy »unh»r * & fey t-a * «' *e sots
that subtracting * nutther » rrsJi S is s^ulvileai ta adSin* w its
eoapllsjsnt ftt«i • Mf*He • thus If in 6 binary oouatsr
** t&tis the soapllosat o/ « reading ^hioa s»&as locking up Uis
;*ul*y urieft ttrt dSKja and #4ee-vei lu the oa^, aid
putting out the tubas vfcioU fire ot&guetiag unfi vie iu Ute
electronic auoe) and then let the counts* eo&tlnue add tits dumber
of pulses in rjuertion, and finally t^ice the aa^lifitaat, &^uin, we
a&ve au&trseted the nuabsr. ^etually hm**v»r, this -raoees onn
be done si&ply by trcuef orric^ the carryover le&as t» the opposite
digit ( tube or rtl«y). ic the reity esse this sjoouats t*» a transfer
Qcm toot *e«*c*n each adjnsent pair of digit*, a&e an additional
safes oostoot* in the eleutrouio oaft* the carryover lease go froa
the " tAtar tube plut* to triiis on the next sts^a. Here *e eoul4
insert «n alcetroale transfer oontaat, *» s^wt, for exsnplo in
Figure 1. jthen *c wish to add, the ©©asson eon troi leads far "edd
is given sutoff voltage, the -subtract" lead a large negative vol-
tage. A positive lapulee on the "one0 plate of a state then cause*
one side of the double triade to c endue t giving % negative impulse
to the next g7id» far a enTryvwr • f er subtrfcctioo the voltages
on the soatrol leads ars revexfcod atid carryover ooours when the
"aero" plate volte, • inore&ses i.e., when this tube goes out*
0« £. &£*KjfCX
C-»f A (9-4*)
Cover Sheet for Technical Memoranda
Research Department
subject: clrcuitg for a PiC>M> Transmitter and Receiver -
Case 20878
ROUTING:
" S.A.S.,H.W.B., H.F.
2 -- CASE FILES
* G.W.Gilman
5 -H.W.Bode
s A. G. Jensen
-> W.M.Goodall
8 E.Peterson
9 H.SoBlack
10 -W.F.Simpson - Patent Dept.
11- J. H.Pierce
12- R.L.Dietzold
13- £.B Zeldman t$55$£^L
14- W.T.Wintringham
15- F.B.Llewellyn
16- C.H.Elmendorf
17- B. M.Oliver
1 8- C.E. Shannon
MM
44-110-37
DATE June 1, 1944
author s c.E.Shannon and
B.M.Oliver
ABSTRACT
Circuits are described for a P. CM. transmitter
and receiver. The transmitter operates on the principle
of counting in the binary system the number of quanta
of charge required to nullify the sampled, voltage.
i
MISSION OR TKt RTVELATION or I , C^rt^
Ciroults for a P. CM. Transmitter and Receiver - Case 20878
MM-44-110-3
June 1, 1944
MEMORANDUM FOR FILE
The circuits shown in the present memorandum are
intended to fill the boxes of the block functional designs
for a PCM transmitter and receiver shown in Fig. 6 of a December
1943 lueworandum (MM-43 -110-43) . The transmitter functional
diagram is shown here as Fig. 1 and the general operation
is as follows. The incoming signal is sampled periodically
by closing the electronic switch 1 with periodic impulses
from the timer. This charges condenser C to the sampled
voltage and the electronic switch opens after each impulse
isolating the condenser from the signal. The existence of
a voltage across the condenser causes the comparator to olose
electronic switch 2 which allows pulses of charge to feed
into the condenser from the pulse generator, discharging the
condenser. The number of these pulses is counted in the
binary system by the binary counter and when the condenser
is reduced to a reference voltage, the comparator opens elec-
tronic switch 2. Near the end of the sampling period the
binary counter is connected to the distributer which registers
the binary number counted, and the counter is then reset to
zero; both of these operations controlled by impulses from the
timer. The distributer then sends a series of pulses or not
down the output line according as the binary digits are
1 or 0. These digits are sent in reverse order, the least
important being sent first, to tie in with the contemplated
receiver circuit.
The specific circuits are shown in Figs. 2 to 8, and
detailed descriptions of their operation follow.
Fig. 2 shows the electronic switch 1 which charges the
condenser C to the signal voltage at the sampling times. The
signal wave is biased up so that its minimum value is slightly
positive, and impressed on terminal 1 as a voltage; i.e, the
signal source as seen from terminal 1 is assumed to be of low
impedance. The timer, at the sampling time puts a positive
pulse on terminal 2, which is inverted by the triode to give
a negative pulse on the pentode control grid. This causes the
pentode which was previously conducting to cut off. Before
the pulse condenser C had a small minimum positive charge
and neither diode was conducting since the plates were held
at a low positive potential by the pentode current. As the
THIS DOCUMENT CONTAINS INFORMATION AFFECTING THE
NATIONAL DEFENSE OF THE UNITED STATES WITHIN TH~ MEAN-
ING OF THE ESPIONAGE ACT. SO U. S. C. Jl AND 12. ITS TRANS-
MISSION OR THE REVELATION OF ITS CONTENTS IN ANY MANNER .
TO AN UNAUTHORIZED PERSON IS PROHIBITED BY LAW.
pentode cuts off, the diode plates swing positive and the right
hand diode starts to conduct charging the condenser. As this
condenser voltage builds up exponentially the voltage on the
diode plates also increases positively until it reaohes the
signal voltage and at that instant the left hand diode starts
to oonduct. The voltage stops rising at this point since the
plates are now essentially short circuited to the low impedance
signal source. This all occurs during the timing pulse, and
at the end of this pulse the pentode again starts oonduoting
dropping the diode plates to a small positive voltage, less
than the minimum signal voltage, and isolating the condenser*
Fig. 3 shows a standard multi-vibrator circuit for
giving a series of square pulses. The coil condenser cross
connection of plates to grids causes the grid transient to
be a cosine curve which crosses the cut off grid voltage at
a time determined essentially by the LC product and independent
of amplitude changes due to variations in plate supply, etc.
As this point determines the period of oscillation, the
oscillator has good frequency stability. The output appears
on terminal 6 as a square wave.
Fig. 4 is the comparator, which is actually only a
differential amplifier with sufficient gain so that the
granularity voltage applied to the input is capable of
driving the amplifier from saturation in one direction to
saturation in the other. The input is the voltage on condenser
C which immediately after a sampling instant, will be at the
sampled signal voltage. This voltage starts decreasing by
steps as the condenser is discharged and when the condenser
voltage applied to terminal 3 moves down the step which crosses
the differential amplifier threshold, the amplifier swings from
saturation with output terminal 5 at nearly zero voltage to
a high negative voltage.
The electronic switch 2 is shown in Fig. 5. This
circuit sends units of charge into the condenser through
terminal 3 under the control of the comparator output coming
in on terminal 5. The multi-vibrator output is connected to
terminal 6 and the output of the multi-grid tube will be a
square wave when 5 is positive, which ceases when the
comparator swings to the other saturation point driving the
voltage on 5 in the negative direction. The double diode
connection gives a pump action. When the plate voltage of
the multi-grid tube increases to the upper part of the square
wave, the charge flows into the condenser from terminal 4
through the left diode. During the lower part of this wave
- 3 -
the oondenser discharges through the right diode out into the
condenser C, via terminal 3. As this causes the potential of
3 to decrease gradually down a step function, it is necessary
for the input voltage at 4 to decrease similarly; otherwise
the difference in voltage between 3 and 4 would cause the size
of quanta to decrease gradually. This lowering of the voltage
on 4 is accomplished by a cathode follower arrangement on the
first cathodes in the comparator, which follow the step voltage
down.
The binary counter is shown in Fig. 6. The descending
step voltage which appears on condenser C is applied to the
input of this circuit through terminal 3. The input resistance
condenser combination serves as a differentiating circuit (the
time constant fairly small compared to the time between steps)
so that the voltage applied to the first grid of the double
triode consists of a series of negative spikes. The double
triode is simply a two stage resistance coupled amplifier, and
its output feeds the binary counter digit tubes. This circuit
is of standard type with two pentodes in each stage and there
are two stable points for each stage, one with the upper tube
cut off and the lower tube conducting, and the other, the con-
verse situation. A negative impulse from a preceding stage
applied through the coupling condensers changes the state from
the previous stable condition to the opposite one. This impulse
is applied symmetrically to both suppressors, but the condenser
across the cathode resistances, charged in one direction from
the previous state, biases the choice of the next state toward
the opposite one. The control grids of the "zero" tubes (the
upper row which are conducting when the corresponding binary
digits are zero) are connected to a common control lead which
is used to reset the reading to zero after the reading is reg-
istered by the distributor. This is accomplished by a neg-
ative impulse from the timer. The outputs to the distributer
are taken off the plates of the "unit" tubes.
The distributer is shown in Pig. 7. After the
number of quanta of charge has been counted in the binary
counter, the leads 11, 12, 13, 14, 15 will have either low
positive voltages or B+, according as the corresponding digit
is one or zero. The grids of the left triode, will then be
either negative or positive from the potentiometer action
to the negative voltage C-. To register the counter reading,
a positive pulse from the timer is applied to the control
grid of the common pentode allowing it to conduct and pulling
the cathode of the left triode and the diode in all stages
negatively. If a digit is zero, the potential of the cathodes
in that stage stops at a positive value due to current through
the triode and the diode does not conduct. If the digit is
one the cathodes are pulled negative and the corresponding
oondenser C0 ia discharged through the diode and pentode.
At the end of the registering pulse, the cathodes go positive
again, isolating each C0, with the digit registered as
presence or absence of charge. The reading is taken off the
(/— series of condensers CQ in sequence by positive pulses from
the timer on leads 21, 22, 23, 24, 25. These pulses allow
the right hand triodes to conduct and each Cq in turn to
oharge through the output lead, leaving them in the normal
state (at a voltage about equal to the pulse voltage). If
the digit is "zero" no oharge of CQ from the output lead
occurs. Thus negative pulses appear on the output when and
only when the registered digits are one.
The timer system is shown in Fig. 8. An oscillator
which may be synchronized subharmonically with the pulse
generating multi-vibrator, operates at the sampling frequency.
This passes through the clipper amplifier to give a square
wave, which is differentiated to give alternating positive
and negative spikes. A second clipper amplifier eliminates
the negative spikes and makes the positive ones rectangular.
These short rectangular pulses are fed into a delay line
terminated in its characteristic impedance. The timing pulses
needed for the various circuit functions are tapped off at
the appropriate places as indicated. A synchronizing pulse
may also be taken off the same delay line.
Fig. 9 shows the receiver circuit. The signal
passes through the clipping amplifier which is adjusted to give
a saturation voltage on the output if a pulse is present and
none if absent. This output is applied to the grid of a
multigrid pentode, whose other control grid is given positive
gating pulses at the center of the digit intervals. These
gating pulses allow the pentode to conduct if a pulse is present
and the plate current is then independent of the plate voltage
(providing this stays within certain limits) so that if a
pulse is present, a fixed amount of charge (equal to the
length of the gate times the pentode current) flows onto the
condenser. The time constant of the R C system (including the
pentode load resistance) is adjusted to allow the voltage to
restore itself halfway toward the equilibrium value in the
time from one digit to the next, so that after all pulses
have been oollected on the condenser, the charge contributions
of the first, second, third etc. have decayed by factors of
2^' i2"' 1# At this tlme a positive gating pulse is put
(r on the grid of the second pentode, allowing the condenser to
discharge rapidly into the low pass filter. The timer system
can be realized with the systems shown in either Fig. 10 or
Fig. 11.
C. 2. SHANNON
B. M. OLIYZR
Att.
Figs. 1 to 11
s
.-. \ Si
0
F/G -J
! •
D-0
IuIjw sn*pe to fclnlaine Bend sidtn fcitn Munprerlar^iD* 7-uloea
*e ooaslder tbe problem of » taping pule** #{t) enlen
ere aero outside -fc, U in ouen * wey an to nlalml*» tbe UtmA
nldtn of tbe power opeetrua of t&e ennenble of funotioas fors»4
by aeadiiis s eeq*eaee of tne fuaetlean *{t) end 0, witb epeeia*
or £it tne probabilitiee of eltber b*i»£ 1/2.
suoh eneesiblee of fun art iocs.
Theorem: i*t an ensemble of function* bo defined by
n« -~
enere tbe o^ ere enoeen iadopaaciintly end ore equally likely to
bo one or s«ro. toe power epwetro* of f{t) ti*tn eomnleto of
two parte, e point epeetrom eonsl*tia& of too epeetrw* of
%X * (t*ftam), i.e. tne spectrum of o(t) repented, end o eontin-
uvmm pert eoneintln* of tne ottor^y opoetrm of ♦(*) «
f irst « theorem will bo prored on tne epestrtsa of
Consider too estooorreletlom of f(t)
4{ki - U» |f J *f <*> f(t»k) dt
Y^OO _-r
• U» A /*£ e{t***n) £ n* o(t**»m»>>} dt
I** integrand oen bo written
^a % a* a(t*a*a) »{t**««00
* j} •* a(t*t*a**J
4 •£ fit-in) oftt* a«»*vJ
>Uaa «• eraraga , Hit aua of tfca first two parta givaa Urn suto-
correlation of ti* f aaatiaa J £ a* aiaaa tka ooaffiaiaata
a* aa (a^a) feara saa oaanea ia four of aalag toots a$aal to eaa,
aaa ia tat aaaoaa t«r* *jS aaa taa aaa* ataa vaiaa.
Ttoo iaat tana la taa liait reausao to
fit) f|I V) at
• a
by *? aoapaaaatoa for taa attoaar of taras.
Taaao two parts (in taa saaarata aaa aaatiaaoaa porta
of taa apaetnaa, taa first tolas taa aataoorrslatioa af a(t)
raaaataa aaa taa aaaoaa tivlog taa saargy apoatram af a(t)
la oaao »(t) • 0 oatalao -u, £, taa aaaarata part aaa
poaor at o - ft, 1, t , S, .... aaoeatia* to
f (t) - ^ ♦ r am aaa at ♦ I. »a aia at.
Sap^oM w *i*0 to Ofaopo o{t) ljrla« »iti»io -L, I is
•at* • »oj os to alolalso to* bood oprood of too upectrua &*
ooooorod ojr
« - Jo* *(o) do.
Tbo oantriOutiooo of too two parts of too spectra eon oo odd**,
and toot fro* tfc* dooorot* port Is
Tor too continuous port udo& too toooroo t&et too j»£ F*(« ) da -
jt^ltJJ* dt wb*re ffo) ood fim) aro fourUr traoof rao «o Hovo
*t • f*U)f - £ ten1 • h** *a ♦ **a* * «*♦...!
l.o* , tto mm oo too desoroto sootrlootioo. To* tatal a i» therefor*
To mioiodse * «ltO o flood total eoersjr per poise
oed with ooosdoxy ooodltiooo •(£) - - 0 wo vast ooTiooolj
plooo oil too eoergjr la too first tere, o oooloo oorto displaced
to oo tensest to too tUM) oxio.
«■»*
A «
fit)
Cover sheet for technical memoranda
Research Department
subject: A Mathematical Theory of Cryptography - Case E0878 ( ^0
\
ROUTING:
i _ HTfffl-HF-Case Files
2 -
CASE files
3 —
T
V »
4 -
T
5
H.
3. Black
6 -
F.
B. Llewellyn
7 -
H.
Nyquist
8 -
B.
tf» Oliver
9 -
R.
E, Potter
io -
C.
B. H. Feldrian
11 -
R.
C. Kathes
12 -
R.
V. L. Hartley
13 -
J.
R. Pierce
14 -
H.
W. Bode
15 -
R.
L. Dietzold
o 16 -
L.
A. MacCall
17 -
W.
A. Shewhart
J.8 -
S.
A. Schelkunoff
19 -
c.
E. Shannon
20 -
Dept. 1000 Files
mm— 45-110-92
date September 1, 1945
author C. E. Shannon
INDEX no. P 0#4
Dos mi saui
ABSTRACT
A mathematical theory of secrecy systems is
developed. Three main problems are considered. (1) A
logical formulation of the problem and a study of the
mathematical structure of secrecy systems. (2) The
problem of "theoretical secrecy," i.e., can a system be
solvod givon unlimited time and how much material must
be intercepted to obtain a uniquo solution to cryptograms.
A sccrocy measure called tho "equivocation" is defined
and its properties developed, (3) The problem of
"practical socrocy." How can systems bo made difficult
to solve, ovon though a solution is theoretically
POS8lbl0t ' • ' THIS OOCUKEHT CO^S^-or^ 5g
STATES ^^fK ^
LAWS, TIU.E I? RCVEX****1 OF «J*
CONTENTS »N AN. »N,lth TV
PERSON IS PROHIWTEO BY IA«.
A Mathematical Theory of Cryptography - Case 20878 ((4)
MM-45-110-92
September 1, 1945
Index P0.4
Introduction and Summary • BOD WR 5200.10
In the present paper a mathematical theory of . . •
cryptography and secrecy systems Is developed*. The entire
approach is on a theoretical level and is intended to spmple* :
ment the treatment found In standard works on cryptography, * . • , - V •
There, a detailed study Is made of the many standard types of-^:- •
codes and ciphers, and of the ways of breaking tjiea*. We will
be more concerned with the general mathematical structure, and
properties of secrecy systems, •: . .-'
The presentation is mathematical in character. Wo
first dofino the pertinent terms abstractly and then develop
our results as lcnrias and theorems. Proofs which do not con-
tribute to an understanding of the theorems have been placed
in the appendix.
The mathematics required is drawn chiefly from
probability theory and from abstract algebra. The reader is
assumed to have some familiarity with these two fields. A
knowledge of the elements of cryptography will also be help-
ful although not required.
The treatment is limited in certain ways. First,
thero are two general typos of secrecy system; (x) conceal- *
ment systems, including such methods as invisible ink, con-
cealing a message in an .innocent text, or in a fake covering
cryptogram, or other methods in which the existence; of the . -
message is concealed from the enemy; (2), "true" seorocy systems .
where the moaning of the message is concealed by ciphofr, code,
etc., although "its existence is not hidden. We oonsider_ only V
the second type--oonoealment systems are more of a psychological
than a mathematical problem. Secondly, tho treatment Is limited v
to the case of discrete information,, whore tho information to
bo enciphered consists of a sequence of discrete symbols, each -
chosen from a finite set. These symbols may be letters in a
*Soo, for example, H.F.Gaines, "Elementary Cry^tana^1J(s^oRMAT.oN w«g
or M. Glvierge, "Cours do Cryptographic. ft;5 TME katonm- oi^ w ^Vvonage
* " person is p*«oH»an«> a*
- 2 -
language, words of a language, amplitude levels of a "quantized"
speech or video signal, etc., but the main emphasis and think-
ing has beon concerned with the case of letters. A preliminary-
survey indicates that the methods and analysis can be general-
ized to study continuous cases, and to take into account the
special characteristics of speech secrecy systems.
The paper is divided into three parts. The main re-
sults of these sections will now be briefly summarized. Tho
first part deals with tho basic mathematical structure of
language and of secrooy systems, A language is considered for
cryptographic purposes to bo a stochastic process which pro-
duces a discrote sexjuonco of symbols in accordance with some
systems of probabilities. Associated with a language there
is a certain parameter D which wo call tho redundancy of the
language, D measures, in a sense, how much a text in tho
language can be reduced In longth without losing any informa-
tion. . As a simple example, if each word in a ■t'efcfc' ip repeated
a reduction of 50 'per cent is immediately poesi*lcV .further 4 : :
reductions may be possible due to tho statistical structure of *
tho language, the high frequencies of cortaih lottersorv words, r
etc. The redundancy is of considerable importcjido ' ;in; the ' study '
of secrecy systems. , ' /; '
A secrecy system is defined abstractly as a sot of
transformations of one space (the sot of possible messages)
into a socond space (the sot of possible cryptograms). Each
transformation of the set corresponds to enciphering with a
particular key and the transf omations are supposed reversible
(non-singular) so that unique deciphering is possible when the
key is known.
Each key and therefore each transformation is assumed
to have an a priori probability associated with it— the proba-
bility of cEoosing that key, Tho set of messages or message
space is also assumed to have a priori probabilities for tho
various messages, . i.e., to be a probability c^ measiire space.
f ■
In the usual cases the "messages" oonsist of sequences
of "letters.". In this oase as noted above the ©essage space is
represented by a stochastio process which generates sequences of
letters according to some probability structural ■. ~: - :<p
.' • , • v ' ' '*•:..- •'. - '•• . " • . , ! .' -v • ,;
">." These probabilities for various keys and messages^ are^
actually the enemy, crypt analyst's a priori probabilities for /
the choices in question, and represent his. aj>rl6rf knowledge"
of the situation* Touse tho system a key is first selected
and sent to tho receiving point. The choice of 6,&©y determines
a particular transformation in tho set forming the^sys torn. Then
a message Is selected and tho particular transformation applied
to this message to produce a oryptogram. This cryptogram is
- 3 - •HlffflSHflAL
transmitted to the receiving point by a channel that may be
intercepted by the enemy. At the receiving end the inverse
of the particular transformation is applied to tho cryptogram
to recovor tho original message.
If the enemy intercepts tho cryptogram he can calcu-
late from it the a posteriori probabilities of the various
possible messages and keys which might have produced this
* cryptogram. This set of a posteriori probabilities constitute
his knowledge of the key and moss ago after the interception.*
The calculation of these a posteriori probabilities is the
generalized problem of cryptanalysis • ' ~ ."" " ; \
i *
As an example of these notions, in a, simple substi-
tution cipher with random key there arc 261 transformations,
corresponding to the 261 ways we can substitute for 26 dif-
ferent letters.' These are all equally, likely and each there-
fore has an a priori probability l/B&Wz it this is applied
to "normal English" the cryptanalyst being assumed to have no
knowledge of tho message source o^hoc than,, that- it is English,
tho a priori probabilities of various m&jBsageak Gf N lectors'
.ore merely their frequency in normal JSngiish iext* ~
If the enemy intercepts N letters of cryptogram in
this system his probabilities chango. If N is large enough
(say 50 letters) there is usually a single message of a poster
probability nearly unity, while all others have a total proba-
bility nearly zero. Thus there is an essentially unique "solv
tion" to the cryptogram. For K smaller (say N « 15) there wil
usually be many messages and keys of comparable probability,
with no single one nearly unity. In this case there are multi
"solutions" to the cryptogram. , , -
Considering a secrecy system to be a set of trans-
formations of one space into another with definite probability
associated with each transformation, there are two natural coe
binlng operations v/hi oh produce a third system from two givon
systems. The first combining operation. Is called the product
operation and corresponds to enciphering the message with the
first system R and enciphering tho resulting cryptogram with
system S, the keys for R and 3 being .chosen. ; independently.
This total operation is > secrecy sjrstcte "whose transformations
consist of all the products (in tho Jusual , sons© of products of
transformations) of transformations ia $ with transformations
in R. The probabilities arc 'the prodticts of the" probabilities
for tho two transformations. . . 3. J§E .:\ T-
The sooond combining operation is "weighted addition
»> J T- -
T - pR 4 qS . J . p * q «- 1-
*"Khowlodgo" is thus identified with 'a set of propositions hav
associated probabilities. We are liero' at variance with the
doctrine often .is sumo d in philosophical studies which conside
knowledge to be a set of propositions which are either true o
fslso. . f ■ :. v.
4
t
It corresponds to making a preliminary choice as to whether
system R or S is to be -used with probabilities p and q, respec-
tively. When this is done R or S is used as originally defined.
It is shown that secrecy systems with these twn com-
bining operations form essentially a "linear associative algebra
with a unit element, an algebraic variety that has been exten-
sively studied by mathematicians. Some of the properties of
this algebra are developed.
Among the many possible secrecy systems there is one
type with many special properties. This type we oall a "pure"
system. A system is pure if for any three transformations T, .
T.t Tk in the set the product 1
TiVV .
is also a transformation in the set, and all keys are equally
likely. That is enciphering, deciphering, and enciphering with
any throe keys must be equivalent to enciphering with some key.
With a pure cipher it is shown that all keys are
essentially equivalent—they all lead to the same set of a
posteriori probabilities. Furthermore, when a given cryptogram
is intercepted there is a set of messages that might have pro-
duced this cryptogram (a "residue class"/ and the a posteriori
probabilities of messages in this class ore proportional to the
a priori probabilities. All the information the enemy has ob-
trinod by intercepting the cryptogram is a specification of the
residue class. Many of the common ciphers are pure systoms,
including simple substitution with random key. In this case
the residue class consists of all messages with the same pattern
of letter repetitions as the intercepted cryptogram,
Two systems R and S are defined to be "similar" if
there exists a fixed transformation A with an inverse, A"1 such
that
' . R « AS . , ~
■ * 'J
If R and S are similar, a one-to-one correspondence between the
resulting cryptograms can be set "up leading to the same a poste-
riori probabilities. The two systoms are cryptnnalyticaTly the
samo , v . » .
The second main part of tho paper deals with tho prob-
lem of "thooretical security." How secure is a system again:
cryptanalysis when the enemy has unlimited time and manpower
available for tho analysis or intercepted cryptograms?
"Perfect Secrecy* is defined by requiring of a system
that after a cryptogram is intercepted by the enemy the a pos-
teriori probabilities of this cryptogram representing various
messages be identically the same as the a priori probabilities
of the same messages before the interception. It is shown that
perfect secrecy is possible but requires, if the number of
messages is finite, the same number of possible keys--if the
messago is thought of as being constantly generated at a given
"rate" R, (to be defined later), key must be generated' at the
same or a greater rate*
If a secrecy system "with a finite key is used, and N
letters of cryptogram intercepted, there will be, for the enemy,
a certain set of messages with certain- probabilities, that this
cryptogram could represent. As N Increases the field usually .
narrows down until eventually there is a unique "solution'*: to
the cryptogram — one message with probability essentially unity :
while all othors are practically zero. A quantity OJN) is de- >' .: \
fined, called the equivocation, which measure^ lii n statistical v
way how near the' average cryptogram of H letters is to a unique
solution; that is, how uncertain the enemy, is of the original; - -
message after intercepting a cryptogram of N letters. Various
properties of the equivocation. are deduced — for example, the
equivocation of the key never incroasos with increasing N.
This quantity Q ia s theoretical secrecy index — theoretical In
that it allows the enemy unlimited time to analyse the cryptogram
The function Q(N) for a certain idealized type of
cipher called the random cipher is determined. With certain
corrections this function can be applied to many cases of practi-
cal interest. This gives a way of calculating approximately
how much intercepted material is required to obtain a solution
to a secrecy system. It appears from this analysis that with
ordinary languages and the usual types of ciphers (not codes)
this "unicity distance" is approximately |K|/D. Here |K| is a
number measuring the "size" of the key space. : If. all keys are
a priori oqually likely |K| is the logarithm of the number of
possible keys. D is the redundancy of the language and measures
the excess information content of tho language. In simple sub-
stitution with random key on English |K| isltW) 261 or about , / .
£0 and D is about .7 for English. ■ Thus unicity occurs at about ..
30 letters. _ *' ' . _ >. ;J;V^a'V''VY. '
It is possible to" oonstruct secrecy . systems with a
finite key for certain ""languages" in which the function ft(N)
does not approach zero as N «©» - In this case, no natter how .
much material is intercepted, the enemy still does not got a., —
unique solution to the cipher but is left with many alterna-
tives, all of reasonable probability. Such systems we call
ideal systems. It is possible in any language to approximate
such behavior — i.e.., to make the approach to zero of Q(N) recede
- 6 -
ifcyiii'lUJJJ'llAL
out to arbitrarily large N. However, such systems have a
number of drawbacks, such as complexity and sensitivity to
errors in transmission of the cryptogram.
The third part of the paper is concerned with "prac-
tical secrecy." Two systems with the same key size may both
be uniquely solvable when N letters have been intercepted, but
differ greatly in the amount of labor required to effect this
solution. An analysis of the basic weaknesses of secrecy sys-
tems is made. This leads to methods for constructing systems
which will require a large amount of work to solve* A certain
incompat ability among the various desirable qualities of
secrecy systems is discussed,
\ -
PART I
FOUNDATIONS AND ALGEBRAIC STRUCTURE OF SECRECY SYSTEMS
1. Choice, Infornatlon and Uncertainty
Suppose we have a set of possible events whose proba-
bilities of occurrence are p,, pg, ... , p_. Those probabilities
are known, but that is all we know concerning which event will
occur. Can we define a quantity which will measure in some
sense how ^uncertain" we are of tho outcome? How much "choice"
is involved in the selection of the event by the chance element .
that operates with those probabilities? We propose as a numer-
ical measure of this rather vague notion the quantity
. ,n " : . ' :' .
H « - Z pA log pA* »
There are many reasons for this particular formula. Quantities
of this kind appear continually in the present paper and in the
study of the- transmission of information.
To justify this definition wo will state a number of
properties that follow from it. Those properties will not be
provod here,* but are easily deduced from the definition.
Properties of H * - 2 p^ log p^.
1. H = 0 if and only if all the p.^ but one are zero, this
one having the value unity. Thus only when we are certain
of the outcome does H vanish.
2. For a given n, H is a maximum and equal to log n if and
only if all the p, are equal (i.6. l/n) . This is also
intuitively the most uncertain situation.
3. Suppose there are two events in question, with m possi-
bilities for tho first and n for tho second. Lot p^^ be
the probability of tho joint occurrence of i for tho first
and j for the second. The uncertainty of the joint event ?•.
is - .
H " " I J Pi^ l0g PiJ • •
For given probabilities p^^ ■ Z p. . for the first and
* It is intended to develop these results in coherent fashion
in a forthcoming memorandum on the transmission of informa-
tion. '
qj » S for the second, tho quantity H is maximized if
ond only if the events are independent, i.e., p^. = Pi^j *
This maximum value is the sum of the individual uncertainties
H — Hx * Hg
» -^S pj log Pj^ - 2 log q j ♦
These facts can bo generalized to any number of .different
events, > ^ % .
Suppose there are two chance events A and B as in 3. not
necessarily independent. We define the mean conditional
uncertainty of B, knowing A as - •••
BTA(B) - 2 p{A) HA(B>
where HA(B) is the uncertainly of B when A has a definite A
value A. Thus ^(B) is the average uncertainty of B for
all different events A, weighted according to their differ-
ent probabilities of occurrence c The uncertainty of tho
joint event is the sum of the uncertainty of the first and
the mean conditional uncertainty of the second. In symbols
H(A,B) - H(A) + HA(B)
This is true whether or not thero are any casual connections
or correlations between the two evonts.
In the same situation the uncertainty of B is not greater
than the joint uncertainty H{A,B),
H(B) < H(A,B)
The equality holds if and, only if every B (of prdbability /~;
greater than zero) is consistont with -only one A. That -
is, if A is uniquely determined by B. •
From properties 3 and 4 wo have . .. r- .*
H(A) + H(B) > H(A,B).
H(B) > H(A,B) - H(A)
= H(A) + HA(B) - H(A)
H(B) > H,(B)
7.
Thus tho uncertainty of B is not greater than its avoragc
value when we know A. Additional information never in-
creases average uncertainty. The equality holds if and
only if A and B are independent.
Suppose we have a set of probabilities plf pg, pn#
Any change toward equalization of these (supposing 'them
unequal) increases H. Thus if p^ < pg and^wo Increase p^,
decreasing pg an equal amount (to keep the sum 2 p^ con*
stant at unity) so that p^ and pg aro more nearly equal,
then H increases . More generally if v/e perform any rtaver-
aging " operation on the pj,, of tho form '
■pi
8.
a permutation of tho p. with H of course
samc^. 3
where 2 a^j * 1 and all a^ > 0, then H increases (except
in tho special case where this transformation, amounts to
no more than
remaining the
... •
H measures In a certain sense how much "information is '
generated" when the choice is made. Suppose such a chance
event occurs and we wish to describe which of the n possi-
ble events took place • The average amount of paper re-
quired to write.it down in a properly chosen notation is
in the cases of interest to us, about proportional to H.
Thus there might be 10^0 «■ 1Q50 possible events, with
10
■ 10"" 3^ and
of them having a pr
probability of ^ .1CT50. We could set up a notational sys-
tem to describe which event occurs as follows* We number
the events from 1 up to 10*^ + 1050 and when one occurs -
write down the corresponding number. The average amount
of paper required will be proportional to the overage
number of aigits we need. This will bo nearly 30 If the'li. /iy
event Is in the first group of lO30, and about 50 If In the' "/*;/
second group. Thus the average number of digits, is about
40. We also have ,"• - V
K* -10'
* 40
30 | ip-ftf-iog ficT50
- 10
9-. Although tho last result is only approximately true vtf
the number of choices is finite it becomes exactly tri.
when an unlimited sequence of choices is made. Thus 3
a sequence of N independent choices is made each choic
being from n possibilities with probabilities
p^, Pgi ••*» Pn then the total amount of information
genoratod is
H ■ - N Z Pjl log pj
; If N is sufficiently large, the expected number of dif
required to register tho particular choice made is arl
trarily close to H, providing the. correspondence betwc
- sequences of digits and sots of choices is correctly r
. If incorrectly made it will be greater than H-. Moreo\
./V if n is sufficiently largo tho probability of needing
more than H digits is very small* - \ / . ,
10* It can be shown that if wo requlro^oejrtiairi reasonable
"properties of a measure o^choioot^H^ncertainty then
formula - S.p^ log pA necessarily follows* These roqv
properties and the proof of this statement are given i
Appendix It The chief property is that tho measure be
a sense additive— if a choice be decomposed into a sei
of choices the total choice is the sun (properly weigl
of the individual 'choice*. . ^
II, Finally we note that quantities of the type 2 log j
have appeared previously as measures of randomness, pr
larly in statistical mechanics. Indeed the H in Boltr
H theorem is defined in this way, being the probabi
of a system being in cell i of its phase space. Most
the entropy formulas contain terms of this type.
■ ■■■■■■■■ - ♦,"-''-\
Tho base which is used in taking logarithms in the for
amounts to a choice of the unit of measure. v If the base is
we will call the resulting units "digits;" if the base is t
the .units will be oallod Halternativps.^ i- One digit is nbou
alternatives. A' choice from 1000 equally likely possibilit
is 3 digits or about 10 alternatives. . ,
2. Language as a Stochastic fepcess> 6 v •
A natural language, such as English, can be studi
from many points of view — lexicography, syntax* semantics,
history, aesthetics, etc. The only properties of a languag
of interest in cryptography are statistical properties. Wh
are the frequencies of the various letters, of different di
(pairs of -letters), trigrams, words, phrases, etc.? What i
the probability that a given word occurs in a certain mossag
The "cleaning" of a message has significance only in its in-
fluence on those probabilities. For our purposes all other
properties of language can be omitted. We consider a langur.
therefore, to be a stochastic {i.e. a 'statistical) process w
generates a sequence of symbols according to some system of
probabilities. The symbols will be the letters of the langu
together with punctuation, spaces, etc., if these occur.
Conversely any stochastic process which produces a
discrete sequence of 'symbols will be said to be a language.
This will include such cases as: , , ,
1. • Natural written languages such as English, German, Chine
S% Continuous information sources that have been rendered
discrete by some quantizing process,:. Tor example., the
quantized speech from a PCM transmitter, or a quantized
•television signal* * ..
3. "Artificial" languages," where we merely defiae abstract 1
a stochastic process which generates a sequence of symbc
The following are examples of artificial languages.
(A) Suppose wo have 5 letters A, B, C, D, E which are
chosen each with probability .2, successive choicer
being independent. This would lead to a sequence c
which tho following is a typical example.
B DCBCECCCADCBDDAAECEEA
ABBDAEECACEE'BAEECBCEAD
This was constructed with the use of a table of rar
numbers,* •.:'<•
(B) Using the same 5 letters lot the probabilities be
.4, .1, .2, .2, .1 respectively,. with successive
choices independent.- A typical "text" in this
language is thoni . ' ;1^fC> ' ' ^ '.;
""' ' a A A C D C B D C E A A D A D A C E D A '
v . f ; J; 'v i A P CA BE D A D D CE;0 A AAA A D
■(C) A more complicated structure is obtained "if succesi
letters are not chosen" independently but their prot
bilities depend on preceding lottors. In the simpj
* Kendall and Smith, "Tables of Random Sampling Numbers,"
Cambridge, 1939.
- 18 -
case of this type a choice depends only on the
preceding letter and not on ones before that. The
statistical structure can then be described by a
set of transition probabilities p^j), the probabi"
that letter i is followed by letter The indices
i and j range over all the letters in the language
A second equivalent vrny of specifying the structur
is to give the digran probabilities p(i,j), the re!
tive frequency of the digram 1 j in the language.
The letter frequencies pTi), (the probability of
letter i), tho transition probabilities p^j) and 1
digram probabilities p(i,j) are related by the foi:
ing formulas,, , ~ "■• . ~.
pfi) -3 p(j,,J) -2 p(j,i) ~ Z p(jWlj'-
' . :. t.J ,,, x y . j ■ 3 :
;: - P(i) %M J^^^xl 2|J
i p1(ji -|p(i) - p(i j) * i %
As a specific example suppose there are three lettt
A, B, C with the probability tables:
PiU)
A
3
B C
A
0
,e .2
i B
.5
•5 0
c ;
,5
.4 a
A
B
P(i)
9
2?
16
£7
a
27
A
3
B
A
0
4
IF
i B
8
27
e
27'
1
ST
4
135"
A typical text ^in, this language is the following.
A B B ABA B A B. A B A B A B'B B ABB B B B A B
k ;B A B A BAB B B A C A C A B B A 3 B B 3 A B B
A> A C B B B A B A \. "
The next increase in complexity would involve trigr
frequencies but no more* The choice of a letter wc
depend on the preceding two letters but not on the
text before that point. A set of trigram frequonci
13-
p(i,j,k) or equivalently a set of transition prob:
bilities Pjj(k) would bo required. Continuing in
this way one obtains successively more complicate;
stochastic processes. In the general n-gram case
a set of n-gram probabilities p(i^, ig, • in)
or of transition probabilities p, , ^
11 H> Vl
is required to specify the statistical structure,
(D) Stochastio processes can also be defined which prt
duce a text consisting of a sequence of "words. "
Suppose there are 5 letters A, B, C, D, E and 16
"words" in the language with associated probabilii
' .10 A .16 BEBE - .11 tJABED - 3 .04 DEB
' .04 ADEB • .04 BED . . .05 CEED , »15 DEED
' .05 ADEE • .02 BEEP - 3 .08 DAB ' V >• 01 EAB
*: .OX BADD • .05 CA * .04 DAD" v ? i .05 EE ^
Suppose successive "words" are cndseii Independent:
and are separated by a space. A typical message
might be:
DAB EE A BEBE DEED DEB ADEE ADEE EE DEB BEBE BEBE
BEBE ADEE BED DEED DEED CEED ADEE A DEED DEED BEBI
CASED BEBE BED DAB DEED ADEB
If all the words are of finite length this process
is equivalent to one of the preceding type, but t:
description may be simpler in terms of the word
structure and probabilities. We may al3o general:
here and introduce transition probabilities betwee
words, etc., ^ I, -
• .>. " i
These artificial languages are useful in construe
simple problems and examples to illustrate various posslbil
V£e can also approximate to a natural language by_ moans of c
series of simple artificial languages* The aero order appr
mation is obtained by choosing all letters with the seme pr
bility and Independently. The first order approximation is
obtained by choosing; successive letters independently but e
letter having the same probability that, it does in the natu
language,. .Thus in the first order approximation to English
is chosen with probability .12 (its frequency in. normal Eng
and W with probability .02^'but there is no influence betwe
adjacent letters and no tendency to form the preferred digr
such as.TH, .ED, etc. In the second order approximation dig
structure is introduced. . 'After a letter is chosen, the nex
one is chosen in accordance with the frequencies with which
the various letters follow the first one. This requires a
table of digram frequencies p^(jj, the frequency with which
letter j follows letter i. In the third order approximatio:
trigram structure is introduced. Each letter is chosen wit
probabilities which depend on the preceding two letters.
3. The Series of Approximations to English
To give a visual idea of how this series of proce;
approaches a language, typical sequences in the approximate
to English have been constructed and are given below* In a:
cases wo have assumed a 27 symbol "alphabet t ho 26 letter;
and a space. - " ,.,
1. Zero order approximation {symbols independent and equ:
probable);-'.-, * •'•^./,. ' ' '■, \. ." t
XFCKL RXKHRJFF JUJ ZLPWCFWKErW FFJEYVKCQSGXYB
QPAAMKBZAACIBZLHJQD •
2. First order approximation (symbols independent but wit
frequencies of English text). y
OCRO HXI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHT.
\ OOBTTVA NAH BRL
3. Second order approximation (digram structure as in En(
OK IE ANTSOUTINYS ARE T INC TORE ST BE S DEAMY ACHIN D
ILCNASIVE TUCOOVSE AT TEASONARE FUSQ TlZIN ANDY TOBE
SEACE CTISBE "
4. Third order approximation (trigram struoture as in Eng
IN NO 1ST IAT WHEY CRATICT FROURE BIRS GROCID PON DEN OL
OF DEHONSTURES OF THE REPTAGIN jIS REGOACTIONA OF CRE
5m 1st Order Word Approximation." Rather than continue wi
. . • tetragram, n-gram structure, it is easier and bett
to jump at th^a point to ..word units. Here words are
chosen independently but with their appropriate fro que
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN
DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO
EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE
THESE. -
6. End Order Word Approximation. The word transition
probabilities are correct but no further structure is
included,
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER
THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER
METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLL
THE PROBLEM FOR AN UNEXPECTED
The resemblance to ordinary English text increase
quite noticeably at each of the above steps* Note that the
samples have reasonably good structure out to about twice t
range that is taken into account. in their construction* Th
in (3) the statistical process Insures reasonable text for
letter sequence, but four-letter sequences from the sample
usually bo fitted Into -good sentences,. . In (6) sequences of
or more words can easily be placed in sentences without unu
or strained constructions > Tfio particular sequence of ten
words "attack on att- English writer that .the charaoter of th
Is not. at all unreasonably. *»^*** • '-- ^ ^
The first two samples were constructed by the use
a book of random numbers in conjunction for (2) with a tabl
of letter frequencies. This method might have been continu
for (5), (4), and (5), since digram, trigram, and word freq
tables ore available, but a simpler equivalent method was u
To construct (3) for example ono opens a book at random and
selects a letter at random on the page. This letter is re-
corded* The book is then opened to another page and one re
until this letter is encountered. The succeeding letter is
then recorded. Turning to anothor page this second letter :
searched for and the succeeding letter recorded, etc* A si:
process was used for (4), (5), and (6). It would be lnterc
if further approximations could bo constructed, but the lab
involved becomes enormous at the next stage* • ,
The stochastic process 6 is already sufficiently c
to English for many cryptographic purposes since most crypt-
analysis is based on "local" structure of not more than two
three words in length.' . ' ~
. - ■ . :; s ; • .
4*. Graphical Representation of a Markoff Process
Stochastic processes of tho type described above r
known mathematically as discrete Karkof f processes and have
been extensively studied in the literature** $ho general ci
ysi-: .'A
* For a detailed treatment see M. Frochet, "Methods des fon
arbitraires. Theorie des enSnements en chaine dans le ca:
d'un nombro fini d'etats possibles." Paris, Gauthier-Vill
1938. ~
16 -
can be described as follows. There exist a finite number c
possible "states" of a system; S1, Sg, . .., Sn» In additic
there is a set of transition probabilities; q^j) the probe.
bility that if the system is in state S± it will next go tc
state Sy To make this Markoff process into a language ger.
tor we need only assume that a letter is produced for each
transition from one state to another* The states will corr
spond to the "residue of influence" from preceding letters.
The situation can be represented graphically as s
in Figs. 1, 2, 3 and 4. . The "states" are the junction poir.
in the graph and the probabilities and letters produced for
transition are given beside the corresponding line. Fig. 1
for the example B in Section 2, while Fig, 2 corresponds tc
example C. In Fig. 1 there" ijs only ono stato since success
letters ere independent* In Fig» 2 there are as many state
as letters. If a trlgram example wero constructed there wc
be at most n states corresponding to the possible pairs of
letters preceding the one being choson. Figs. 3 and 4 shov:
graphs for the case of word structure in example D. In the
S corresponds to the "space" symbol. In Fig. 3 each word h
a separate chain of branches from the left to the right juii
point, while in Fig. 4 the branches have been combined, sic
fying the graph.
5. Puro and Mixed Languages
As we have indicated above a "language" for our p
poses can be considered to bo generated by a Markoff proces
Among the possible discrete Markoff processes there is a gr
with special properties of significance in cryptographic wc
This special class consists of the "ergodic" processes and
shall call the corresponding languages "pure languages." A
though a rigorous definition of an ergodic process is somev;
involved, the general idea is simple. In an ergodic proces
every sequence produced by the process is the same in stati.
tical properties. Thus the letter frequencies > digram fre-
quencies, etc.",- obtained from particular sequences will, as
lengths of the sequences increases, approach definite limit,
independent of the particular sequence. Actually this is n
true of every sequence but the sot for which it is false ha;
probability zoto. Roughly the ergodic property means, stati;
tical homogeneity, -
. « - • ••• • / - --iV-r ,
v (' - " . . .
All the examples of artificial languages given ab
are pure, the corresponding Markoff process being ergodic.
This property is related to the structure of the correspond
graph. If tho graph has two properties the language it gen
will bo pure. These properties ore:
1. The graph cannot be divided into two parts A and B su
that it is impossible to go from junction points in r.
A to junction points in part B along lines of the gra
in the direction of arrows and also impossible to go
from nodes in part B to nodes in part A,
2. A olosed series of lines in the graph with all arrows
on the lines pointing in the same orientation will be
called a "circuit." The "length" of a circuit is the
number of lines in it. Thus in Fig. 4 the series BEE
is a circuit of length 4. The second property requir
is that the greatest common divisor of the lengths of
all circuits in /the graph be one, : \ -
If the first condition is satisfied but the secon
one ( violated by haying the greatest common divisor equal to
d > 1, the sequences have a certain type of periodic struct
The various sequences fall into d different classes which a:
statistically the same apart from a shift of the origin (i..
which letter in the sequence is called letter 1) V» By a shi:
of from 0 up to d - 1 any sequence can be made statisticall
equivalent to any other. A simple example with d = 2 is th-
following. There are three possible letters a. b, c. Lettc
a is followed with cither b or c with probabilities ± and £
3 3*
respectively. Either b or o is always followed by letter a
Thus a typical, sequence is
abncacacabacababacac. .
This typo of situation is not of much importance for our woi
If the first condition is violated the graph may 1
"separated" into a set of subgraphs each of which satisfies
first condition. We will assume that the second condition 2
" also satisfied for. each subgraph. We have in this case what
may be called a ''mixed" language made up of a number of pure
components. . The components correspond to the various subgrc
If **1» ^3* D:ce ^ne component languages we may write
> t - p^ ♦ p^2 * p3% ♦ *y->f\
where pA is the a priori probability of the component langut
• ■ - j .
Physically the situation represented is this. The
are several different languages 1^, 1^, Lj, which are e
of homogeneous statistical structure (i.o., they are pure
languages). We do not know a priori which is to be used, bu
once the sequence starts in a given pure component it cor.
- 18 -
indefinitely according to the statistical structure of that
component. Wo do havo, however, a set of a priori probabilities
for tho various components, p^, pg, .
As an example one may take two of the artificial
languages defined above and assume p^ = .2 and p2 » .8. A
sequence from tho mixed language
L » .2 1^ + ,.8 Lg
would be obtained by choosing first or Lg with probabilities
.2 and .8 and aftor this choice generating a sequence from
whichever was chosen* -
A natural language, such as English or German, is
not, of course, pure. Different kinds of text, literary,
newspaper , technical or military, display consistently differ-
ent types of structure. Those differences are small, however,
in comparison with the differences -between different natural
languages. If only local structure— letter, -digram and trigram
frequencies, for instance — is of much importance, it is reason-
able to consider "normal English" to be nearly pure.
6. Information Rate and Redundancy of a Language
Suppose we have a pure language L produced by a given
Markoff process. Associated with the language there are certain
parameters which are of significance in questions of trans-
forming the language and in cryptography. The most important
of these is what we will call the "information rate" R for the
language. It measures the rate at which the Markoff process
"generates information," as determined by the measurement of
the amount of choice available on tho average per letter of
text that is produced. In Section 1 we deflnod the amount of
choice when there ore various possibilities with probabilities
Pl» P2i *V, Pn as
H ■ ■ 2 log Pi •
In a Markoff process with a number of different ^states" there
will be a choice value ft^ for each of these states and a proba-
bility of being in each of the states (or a frequency with which
this state occurs)* If this relative frequency for state i is
P*, the average amount of choico Is
R - Z Pi ^
summed over all the states. This is tho definition of the
information rate for the language. If p^(j) is the probability
of producing letter J when in state i we have
^ -2 Pi(j) log Pi(jJ
the sun being over all tho letters in the language. Thus
R - Z Pt Pitj) log ptU)
Tho infornation rate R has the units of alternatives
(or digits) per letter sinoe it neasures the average amount of
choice por letter of text that is produced,
. A second parameter of importance is. the "maximum rate"
RQ for the source. This is defined simply as the logarithm of
the number of different letters in the language. RQ is also
measured in alternatives or digits per letter. If * successive
letters are chosen independently and each letter is equally
likely RQ « R. Otherwise we have R < RQ.
R and RQ are actually two limiting cases of informa-
tion rates for the language. R may be said to be the rate
when no statistical structure is taken into consideration and
R is the rate when all the structure is taken into account.
Between these there is an infinite series of rates R*f- Rg,
RQ, ••• which take some of the statistical structure into
account. R^ takes the letter frequencies into account and is
defined by
% « L p(i) log p(i)
.. - *
where p(i) is the probability of letter i. R2 takes digram
structure into account and is def inod by
R2r-2 p(I)'p1(J) log Pl(J)
where the p(i) are letter probabilities and pjJJ) the ^transition
probabilities, i»e., tho probability of letter i being followed
by letter J; In general we define
*n "Z P<*i» h* W Piifg V d(in)
lOg P± 4 * (i_)
X\H *n-l n
where tho sum is on all indices i, , • i_ and p< • •• .
1 ^ .'I 1n-l
is the probability of (n-1) gram i-^ •*» i^^ with
pi ^n^ tho I^^abillty of this n-1) gram being folio;
1 n-1
by letter i^. ^ may be called tho n-gram information rate fc
the language. It can be shown that
. Ro>Rl>R2 ^ Roo "R
These rates determine how much a language /can be "compressed"
in length by a suitable oncoding process* A language with
maximum rate Rq and rate R can be transformed in such a way
that a sequence of letters N letters long is transformed into
a sequonco of letters only N* letters long where
IV RA « N R
(This is approximate and only exactly true in the'limit as
N -+ oo .) Thus tho information is "compressed" in th6 ratio
R
This is the greatest compression ratio possible. It makes use
of all the statistical structure of the language. If only
n-gram structure is made use of, a compression ratio
is the best possible.
The compression obtained in this way is only a
statistical gain. Some infrequent sequences are encoded into
much longer sequences while the more probable ones go into
shorter sequences so that on the average the length is de-
creased. It is the type of compression obtained in telegraphy
by using the shortest telegraph symbol, a single dot, for the
most froquont letter E, while uncommon letters Q, Z, etc, arc
encoded into longer telograph symbols. An average reduction
in time of transmission is obtained but there are possible
soquencos, e.g., Q Q Q * » t, which require much longer*
_» ■ ■ •
Performing 'a transformation on a language L which
compresses as much as possiblo will be called reducing t to
a "normal" form. When this has been done it can be shown
that all letters in the output are equally likely and inde-
pendent. Actually to realize this transformation would usuall
21 -
r>nT TTT IHF1 TTXj "I
require an infinitely complex machine, but we can always ap-
proximate it as closely as desired, with a machine of finite
complexity.
Tho quantity
D = RQ - R
will bo called the redundancy rate of the language. It meas
the excess information that is sent if sequences in the lang
arc transmitted in their original form (without compression
reduction to normal form). Correspondingly thero is a whole
series of redundancy rates:
Do - Ro - V
Dp - R, - R?
ej x m
D = R - R
n o n
D = Rc - R
is the redundancy rats due to n-gram structure in the
language .
The redundancy D can also be said to measure the
amount of statistical structure in the language. If the se-
quence is purely random D = 0 whilo at the other extreme if
each letter is completely determined by preceding letters wit
no freedom of choice, D has its maximum" possible value RQ. 3
is sometimos convenient to use the "relative" redundancy D/Rc
which must lie between 0 and 10C#. • ;
V
If we hnvo a source of rate R, maximum rate R (bot
in digits per letter) and consider the possible sequences of
letters these fall into two groups for N large. One group ol
"high probability" sequences contains about
10™
zz
sequencGS (where we have assumed R measured in digits per letter).
All of those have substantially the same logarithmic .probability.
The remainder of the total of 10*°* possible sequences are of
very small probability. In fact thoir total probability ap-
proaches zero as N increases . The logarithm of the probability
of an individual sequence in the high probability group is thus
about -RN. In a procise statement of these results we must allow
a certain fuzzincss in R, i.e., replace R by R ± e whore e -* 0
as N -* oo «
.
Reduction of a language to normal form is performed
by properly matching tho probabilities of sequences to the
length of the corresponding sequences in the normal form. The
"high probability" sequences are translated into short sequences
and tho remainder into longer sequences.
_ An example will clarify tho results we have given.
Let the language contain 4 lotters A, B, C, D. In a soquenoe
successive lotters are chosen independently, the four letters
having probabilities ^, ^, |, £, respectively. Vie have
rq m iog2 4-2 alternatives/letter
and
1 11 12 1
Rl * R2 " % " " R " " (2 log t + 4 loe 4 + 8 los 8"}
■
* I + I * I ** 4 alternatives/letter
By a suitable transformation the average length of sequences
can bo reduced by tho factor ^/2 - 7/8. A transformation to do
it is the following. First wo translate into a sequence of
binary digits (0 or 1 ) by the following table
A 0
B 10
- C 110
D 111
After this pairs of the binary digits aro translated into the •
original alphabot as follows
00 ' A1
01 B»
10 C»
11 D«
- 23 -
For a typical scquonco this works out as shown below:
AB CABAC BBDAA D A D A
0 10 110 0 10 0 110 10 10 111 0 0 111 0 111 0
Regrouping and translation back into letters:
01 01 10 01 00. 11 01 01 01 11 00 11 10 11 10
. B« B» C« B» A» V B' B« B» D« A* D« C» D' C
In this case there are 16 letters in the original and 15 in
final text. Thus due to the snail redundancy and the short
of the text only part of tho saving is; evident* . In a long
hoivever the full reduotion -of g would appear* , This nay be
verified directly in this cose. In a long text of N letter
each letter will appear with about its. appropriate* *requenc
Thus the nuriber of binary digits will be about
N[| • l + J-2+|«3+^-3] ■ J N
since each A gives one binary digit, each B gives two, etc.
nuriber of letters in the final text is half this since each
pair of binary digits goes into ono letter. Thus the re due
is by a factor Z .
0
It is also easy to seo in this case that the bina
digits are equally likely and independent, and fron this th
tho final text letters are also*
This situation is nore coriplicated for nixed long
and we shall not enter into it here* Wo nay note, however,
that if
L -jpfo* •'»•• ♦ PnIfc :
whore 1^ is pure with rate R^f then the long sequences of
fall into (n+1) groups^ The first n groups correspond to t:
pure conpononts. Thpse in gr oup 1 nunber about -
and have logarnithic probability about
24 -
^■'H M, || | |
Tho last group contains all other sequences and has a snail
total probability*
7, Redundancy Characteristic of a Language
The form of the curve D(N) as a function of N na;
called the redundancy characteristic of the language. In :
rough way it describes the way in which the redundancy appt
In Fig. 5 several types of characteristics are shown, all i
the same final redundancy. The way in which this approach
is of importance in cryptography. For languages which reac
final redundancy at one or two letters (Curves 1 and 2) one
of cipher (ideal ciphers) can be used. For those which rer
near zero out to fairly large N (like Curve 5) another type
appropriate. Natural languages are apt to show a character
more like 3, and this makes them difficult to encipher witi
security by simple means. ■ .
- Examples ;
1. A language in which successive letters are independer
but with different probabilities has a characteristic
Type 1.
2. Consider a language constructed as follows. First sc
268 different sequences of letters, each 16 letters 1
from tho 2616 possible sequences of this length. Th:
should be a random selection. The 16-letter sequence
chosen aro the "words" of tho language. Messages arc
random sequences of those "words." Such a language 1
a characteristic like the Curve 5,
3. A language with digram structure only, such as Exampl
in Section 2 above, has a characteristic of the Type
Fig. 5, reaching its final value at N = 2.
4. English has the characteristic 3 in Fig. 5.
■
The redundancy characteristic describes how the
structure in the language is spread out. If the structure
localized, tho curve rises rapidly to its final value. If
there are 'long range influences the asymptotic value is ap-
proached more, slowly. If the structure is "locally random"
the curve will romain near zoro for small N.
8. Secrecy Systems
Before we can apply any mathematical analysis to
secrecy systems, it is necessary to idealize the situation
suitably, and to define in a mathematically acceptable way
what v«e shall mean by a secrecy system. A "schematic" -diagram
of a general secrecy system is shown in Fig. 6. At the trans-
mitting end there are two information sources — a message source
and e key source. The key source produces a particular key from
among those which are possible in the system. This key is trans-
mitted by some means, supposedly not intercept ible , e.g. by mes-
senger, to the -receiving end. The message source produces a
messnge (the "clear") which is enciphered, end the resulting
cryptogram sent to the receiving end by a possibly interceptible
means, for example radio. At the receiving end the cryptogram
and key are combined in the decipherer to recover the message.
Evidently the encipherer performs a functional opera-
tion. If M is the message, K the key, and E the enciphered mes-
sage, or cryptogrrm, we have
I - f(M, K)
i.e. E is r function of M end $« We prefer to think of this,
however, not as n function of two variables but as n (one para-
meter) family of operations or trcnsforma tions , and we write it
E - T,M. .
The transformation T, applied to message M produces cryptogram E.
The index i corresponds to the particular key being used. If
there are m possible keys there will be m transforations in the
family Tg, ...... Tffi,
At the receiving end it must be possible to recover
M , knowing E and X. Thus the transform tions in the family
must have unique inverses
M - Tf 1 E
at any rate this inverse must exist uniquely for every E which
can be obtained from an M with key i.
The key souroe can be thought of as a "probability
machine," something which chooses from the possible keys ac-
cording 'to a system of probabilities. Mathematically then, the
keys (or the parrmeter of the family of transformations) belong
26 -
THiTijfjjiriirrTUT
to q probability or measure spree. Hence we r-rrive rt the
definition:
A secrecy system is o family of uniquely reversible
transformations T, of r message spree ^ into 0 cryptogam
spr.ce.Tl_,, the parameter i belonging to a probability spr.ee CL..
Conversely any set of entities of this type will be called a *
"secrecy system." . .
The system can be visualized mechanically as a
machine with one or more controls on it- ' A sequence of letters,
the message, is fed into the input of the machine and a second
series emerges at the output. The particular setting of the
controls corresponds to the particular key being used. Some
method must be prescribed for choosing the key from all the
possible ones*
To make the problem mathematically tractable we shall
assume that fthe enemy knows the system being used* That is, he
knows the family of transformations T,, and the probabilities
of choosing verious keys*
One might object to this as being unrealistic, in that
the cryptanalyst often does not know whet system was used or the
probabilities of vrrious keys. There are two answers to this
objection.
1. The resumption is rcturlly the one ordinarily used
in cryptogr-phic studies. It is pessimistic and
hence s-:fe, but in the long run realistic (particu-
larly in military work), since one must expect his
system to be found out eventually through espionage,
captured equipment, prisoners, etc. Thus, even when
an entirely new system is devised, so thot the enemy
crnnot rssign rny a_ priori probability to it without
discovering it himself, one must still live with the
expectation of his eventual knowledge, •
.
2. The restriction Is much weeker thrn appears at first,
due to our broad definition of what constitutes the
system. Suppose a cryptographer intercepts a message
and does not know whether a substitution, transposi-
tion, or Vigenere type cipher was used* He can con-
sider this' as being enciphered by e system in which
part of the key la the, specification of which of these
types was used, the next part being the particular
key for that type. These three different possibil-
ities are assigned probabilities according to his
best guesses of the a priori probrbilit ies of the en-
cipherer using the respective types of cipher.
- 27 -
cwiui' mum
A second possible objection to our definition of
secrecy systems is that no account is taken of the common
practice of inserting nulls in a message and the use of mu
tiple substitutes. Thus there is not a unique E ■ T, M, t
actually the encipherer can choose at will among a number
different E's for the same message and key. This -situatic
could be handled, but would only add complexity at the pre
stage, without altering any of the basic results. To defi
the more general secrecy system, one would add a second pa
meter to the transformations T,, which corresponds to the
various choices of cryptograms corresponding to a given me
sage and key. It is possible, but not always desirable, t
consider this second parameter as part of the key, since i
does not need to be transmitted to the receiving point.
We elsO assume that the enemy is in possession o
measure in the space 0M, the a priori probabilities of var
messages. The same object ion"~and essentially tho same ans
might be given to this assumption as to his knowledge of t
transformations T*. This measure, however, we do not cons
rs part of the secrecy system for reasons which wITl apper
later. The secrecy system whose transformations are T. wi
be denoted by T and this concept includes the space or.
which T operates (without its measure ), the trans formation
r-nd the spaces Ojr and "i^,, the former with its probabili
measure.
If the messages are produced by ? M-rkoff proce?
of the type described previously, the probabilities of vrx
messages are determined by the structure of the M^rkoff pr
For the present, however, we wish to t^ike a more general t
of the situation rnd regard the messages as merely an abst
set of entities with associated^. probabilities , not necess'
composed of a sequence of letters and not necessarily prod
by a M^rkoff process.
It should, be emphasized that throughout tne pape
secrecy system means not one but a set of many transformat
After the key is chosen only one of these transformations
used and we might be led to define a secrecy system as a s
transformation on a language.* The enemy, however, does r.
know what key was chosen and the "might have been" keys ar
important for him as the actual one* Indeed it is only tfc
exi stance of these other possibilities that gives the syst
*A. A* Albert in a paper presented at a Manhattan, Kansas,
meeting of the American Mathematical Society (Nov. 22, If
• entitled "Some Mathematical Aspeots of Cryptography has
defined a ciphering system in this way. With this limite
definition about all one can do is to describe and class;
from the mathematical point of view various types of trar
formntions.
28 -
any secrecy.' Since the secrecy is our primary interest,
are forced to this rather elaborate concept of a secrecy
system. This type of situation where possibilities are t
important as actualities is almost the rule in games of
strategy. The course of a chess game is largely control!
by threats which are not carried out. See also the "vir:
existence" of unrealized imputations "in von Neumann's the
of games.
There are a number of difficult epistemologica 1
questions connected with the theory of secrecy, or in fac
with any theory which involves questions of probability
(particularly a priori probabilities. Bayes* theorem, etc
when applied to a physical situation. Treated abstractly
probability theory can be put on a rigorous logical basis
with the modern measure theory approach** As applied to
reality, however, especially when "subjective* probabilit
and unrepec table experiments are concerned, there are mar.
questions of logical validity. For example in the appror
to secrecy made here, a priori probabilities of various k
are assumed known by tEe enemy cryptographer — bow can one
determine operationally if his estimates are correct, on
basis of his knowledge of the situation?
It may happen thrt the keys are chosen by the
cipherer according to one system of probabilities, i.e. c
measure in the key space 0„ nnd that the enemy cryptanaly
estimates a second different system of probabilities fl£ i
this space which ere entirely reasonable in the light e
his knowledge of the situation — which is correct? I be
lieve that both a.re correct.' The calculation besed on Clj,
leads to the solution when the enemy knows just how the
keys pre chosen r nd the solution .based on ^ leads to sol
tions which are correct for a situation agreeing with the
enemy's knowledge of the actual situation. It rppears in
tuitively that the enemy's lock of knowledge can only do
him harm, and probably this can be proved, but this quest
has not been investigated* In fact, we assume only one
measure ^ in the key spaoe* Similar remarks may be made
regarding measure in the messrge space Ow.
*See J» L. Doob, "Probability as Measure," Annals of Math
Stat .\ v, 12, 194J., pp.*206-2U.
A.. Kolmogoroff , "Grundbegrif fe der W^hrscheinlichkeits
Rechnung," Ergebn'isse der Mr.thenetic, v,2, No* 3 (Berlin
1933). -
- 29
\QlifT"rnrTTTTrr
Actually In practical situations, only extrec
errors in P priori probabilities of keys and messages cau
much error""in the important parameters. This is because
the exponential behavior of the number of messages, etc,
and the logarithmic measures employed.
With regard to the application of the m^ theme
theory of probability to physical situations there are tv.
main theories or ways of setting up the correspondence.
The frequency theory- .Probability is correlated with re
frequency of an event* .This Is the correspondence used t
the practicing statistician, in principle by the physic is
etc. (2) The degree of belief approach. .Probability is a
subjective phenomena and measures one's degree of belief
the occurrence of on event* .This approach is seen often
the work, of historians, Judges, and in everyday life. Al
though this latter approaoh has of ten been attacked as me
less we cannot agree with this opinion. In the first pie
the intuitive approach can be given a rigorous mothematic
f«tuv4stion» . This has been done in * very elegont way by
B. 0. Koopmen.* Essentidly one need only assume that a
be capable of making probability judgments (Event A is m:
less probable than event B or they are equiprobable) and
his judgments be self consistent (e.g. if he judges A mor
probable than B end B more probable than C he should jud£
more probable than C). One can even establish numerical
by the use of a "standard gauge," for example a roulette v,
and thus relnte the subjective and the frequency probabil
In the second place, on progmatlc grounds one can hardly
the subjective applications , since almost all of our ever
decisions are based on this sort of probability judgment.
Cryptographic work involves both types of applications,
the use of frequency tables, significance tests etc., the
crypt-nalyct is following the frequency approach. In th
"intuitive" methods of cryptanalysis (probable words etc
degree of belief approach is more- in evidence* »
We may remark that e single operation on a
language which is reversible forms a degenerate type of e
system under our definition— a system with only one key r
unit probability- Such a system has no secrecy — the cryi
analyst finds the message by epplying the inverse of this
transformation, the only one in the system, - to the interc
cryptogram* The decipherer and. cryptanaiyst in this case
*B. 0. Koopman, "The Axioms and Algebra of Intuitive
Probability," Annals of Mathematics, v. 41, no. 2, 1940,
p. 269. "Intuitive Probabilities and Sequences," v. 42,
no.l,. 1941, p. 169.
- 30
fiflPr I IT I l
possess the ssme inf ormation. In gonerr.l, the only differ
between the decipherers knowledge on3 the enemy cryptanal
knowledge is that the decipherer knows the pnrticul^r key
used, while the cryptanalyst only knows the b priori pr->bc
ities of the various keys in the set. The process of deci
ing is that of applying the inverse of the particular tror.
formation used in enciphering to the cryptogram. The proc
of cryptenalysis is that of Attempting to determine the me
(or the particular key) given only the cryptogram find the
a priori probabilities of various keys and messages *
A system will be celled fc^oaed" if any possible
cryptogram can be deciphered with any possible key. This
that the inverse transformations T~l are ell defined for e
element in the cryptogram -spaoe. 1
7/e shPll use the notation |m| for the "size" of
message space: ; ../
X* • ImI- *•£ P(M) log P(M)
where P(M) is the probability of message M end the sum is
all messages of just N letters. Thus \U\ is a function of
and measures the amount of "choice" in the selection of an
letter message. F or large N, |M| is approximately RN.
Similarly Ik] is the size of the key space
IkI - - 2 P(K) log P(K)
the sum being oyer all keys.
9. Representation of Systems
^ A secreoy system can be represented in various
One which is convenient for illustrative purposes is a lin
diagram, as in. Figs. 7, 10, 11. The possible messages are
represented by points at the left end the possible cryptog:
by joints at the right. If;a certain key, say key 1, tran
forms messnge Mg into cryptogram E . then M« and E. are con-
nected by a line ilabeled lf etc» From eacn possible messn
there must be exactly one line emerging for epch different
t
A- second representation is by means of a rectant
array. This may be done in three different ways* For the
closed system of. Fig. 7, the three arrays are as follows:
- 31 -
M3
Ma
V
K
m\. 1
El E4 E2
E3 El E4
E4 E3 E1
E2 E2 E3
^1
M.
M4
E» Eo. E
2 3 4
. K
1
2
3
1,2
1
2
3
2
3
1
E \
1
2
o
El
Ml
%
E2
M4
M4
E3
Mfi
K4
E4
id3
%
transforms % Into E-z and either ?^£Vjt0 E§ by key 3* No
From the third E3 is^e^ipherel hi kL Vf^H M4 ^to Sa.
arrays and the l?ne diagram contain !Lf *? gfVf M3' A1* ofSthese
any one the others can be derived, equivaleGt informs tion-from ,
' * . . • > • ^ • _ . • *• .
transform^^in^ describe the set of ^
bilities of various ke?s mS; ai« £pec}fy tlle system the proba-
by merely listing the kevHftS be eivfn' This mW ^ done
Similarly the melsagl SSbl 1? not Probabilities"
the probabilities of the va^^^S •^.SSJ*1* ^
the set oAZsfor^oL8 W\e? 18 t0 desc1^
forms .on the message for an LhUl^ 8t °Per,2tions one per-
grsm. Similarly one d??iJes f X 6Lto ybtr-in the crypto-
various keys by describing how Tklv £ Probabilities ?™ .
of the enemy's habits of kJv- ilh««f 7 ^ ohosen, or what we know
messages are Implicit detL^0 The Probabilities tor
knowledge of tha e^mvL ? ined by stating our a priori
tion (wflch will Since ^r^nh^^3' th* ^otToaTSfluB, "
and any special inSiVwl fi^Es
. ,«ajr uave regarding the cryptogram.
10. Notation
M
K
E
V
The following notetioa „m generally be followed,
the encipher&d message or cryDtourrm
t%Zll&&\Tctnls -S^SSW probabilUlee, . ^
SbXi^W* ProbaMlitles. also 4
3 » the cryptogram space, also a probability space, sine-
the probabilities in 3L, and induce probabilities
CL/.for each cryptogram,
th
m, ■ the i letter of the message
e^ * the i'tti letter of the cryptogram
k^ « the itn letter of the key when it can be so describe
Generally P stands for a probability- Conditional
probabilities are indicated with subscripts; Thus
P(M.) " probability, of message M
P(E) ■ probability of cryptogram E
P(K ) <■ probebility of key K . •
PM(E) - conditional probability of ,E if message M is chos
Eg(M) :'.» conditional probability of if cryptogram E is
intercepted,- i*e# the a posteriori probability of
• if E Is observed* " " O' , * ■ ■
Q * equivocation, a concept to be defined precisely It
which measures the uncertainty of some ~ knowledge c
fined only by probabilities. We also hr>ve condit
equivocations, thus Q^(K) is the equivocation of ■
key knowing the message.
|k| « - L P(K) log P(K) the size of the key space
\n\ •» - E P(il) log P(M) the size of the message space
[e| • - E P(E) log P(E) the size of the cryptogram space
m * number of different keys
N * number of intercepted letters
RQ » mr-ximum information rate for a language
R « mean rate
JX * R 0 - R ■ redundancy of a language
T, R, S, etc. ■ secrecy systems
T*, R»« S,, etc* » particular transformations of these
systems
11 *
Some Examples -of Secrecy Systems
In this section. a number of' examples of ciphers ^
be given* These will' often be referred to in the remeinde:
the paper for illustrative purposes* " ; * '
'. " ' ■
1. Simple Substitution Cipher.
'■ \ -,.
In this cipher each letter of the message is repl
by a fixed substitute, usually Elso a letter.' Thus the me:
M *. m^ nig m^ m4 » . .
* 33 *
be cranes
el e2 3 4
K*S^S«« x'u ?he IbstttuiV AT 0 is the substitut
for B., etc* " • v. , • .. . »
2, Transposition {Fixed Period dV • - V
The nessr.ee is divided into groups of length d-.nd a
the second group, etc\r!?*P*??£ first d integers- Thus fc
that mx m2 m3 m4 ag m6 nig m10 oeco
^ ^ m5 n4 m? ^ *6 ^ mg ... 4 Sequential npplic*
tion of two or mor, transpositions will be c.Ued compound
imposition. If the periods are *1^V 1 Stow d i.<
thrt the result is a transposition of perioa a,
the least comon multiple of dg, d3, V v
3. Vigenere, rnd. Variations* ■
In this cipher the key consists of a series of d
A « 0 to Z - 25). Thus
e^, » <* fc^ i mod 26} J
where k« is of period d in ithe Index U \f
For example with the key G A H we obtain
message N 0 W I S T H E <* , - .
repeated key G A H G AH G A # * *
cryptogram _ T 0 D. 0 SANE-***
The Vigenere of period \}« •^^"5" xs'alvonced a'
»em^^
may be any number from 0 to 25. The so oexxe* o
- 34 -
V-ri^nt Beaufort r,re similrr to the Vigenere, end encipher by
the equations
el * ki - (mod 26)
ei * mi " ki ^mod 26 ^
respectively. The Be°,ufort of period one is called the
reversed Caeser cipher. .
The application of two or more Yigenfires in sequence
will be called the oompound Vigenere. ' It has the equation
... * j ,
ei * mi + kl * *i **** * *i (mod
' . • • . . . > - ■'«- . .... , , - v.,,.. :- • •
where 1^, *.., in general have different periods P
• • •' ' "'>'•■ •' ■ ■■ '■ . n&; '/ • • ■
The period of their sum • «
< . * * * «
ki + *i + * si
as in compound transposition, is the least common multiple of
the individual periods.
4. Vernam System**
When the Vigenere is used with an unlimited key,
never reperting, we h°ve the Vernam system, with
ei * mi * ki ^mod
the k, being chosen at random and' independently among 0, 1,
25. If the key is a meaningful text we have the "running
key" cipher.
. • '
5. Bazeries Cylinder.
. ,>.'■-■- •• ■„ ; • 'j • • » -v ' ,..«•■<
In this mechanical system 25 thick disks are used, -
each having a mixed alphabet stamped around the edge. These
disks can be arranged in any order on.a spindle,' and the par-
ticular arrangement used constitutes the key.' With the disks
in their proper order; a message, is- enciphered by turning the
disks so that the message appears* on a,. line -.parallel to the
axis of the spindle* Any. other line of letters may then be
chosen for the cryptogram. 'To decipher^ the cryptogram is
arrenged on a line end- the decipherer looks for another line
which then makes sense. —
*G. S. Vernam, "Cipher Printing Telegraph Systems for Secret
Wire' and Radio Telegraphic Communications.'' Journal Ameri.
Inst, of Elect. Eng., Vj ,'XLVy p#, ! 109-115, 1926.
6, Digram, Trigram, rnd N-gram substitution.
Rather than substitute for letters one cnn substi
for digrams, trigr^ms, etc. Genercl digram substitution i
quires n key consisting of a permutation of the 262 digrar
It can be represented by a table in which the row correspc
to the first letter of the digram and the column to the se
letter, entries in the table being the substitutes (usuall
also digrams)*
7* Interrupted Key Vigenere. ,
The Vigenere and its variations can be used with
interrupted key* • The sequence of key letters is -started e
at irregularly spaced points* 7 Thus^ if the entire key sec
isXPGH* TRS> one can Interrupt irregularly to get
X .P OH F TI H X P Gfi ? lE'XPlPO » • •
The points of interruption can be determined in various wt
(1). Whenever a certain letter occurs in the clear »• (£).
Whenever a certain letter occurs in the cryptogram. (3.) /
interrupting letter, say J, can be reserved as a signal ar
the encipherer Interrupts the key at his discretion, (4).
signal is used end the decipherer loontes the interruption
by the appearance of meaningless text in the decipherment,
In place of starting the key again at ecoh. interruption or
can omit letters of it or reverse the direction of progrer
There ere many variations and combinations of these methoc
8. Single Mixed Alphabet Vigenere.
This is a simple substitution followed by a
Vigenere*
e^ » f (n^) + kj
• ■
The "inverse" of this system is a/Vigenere followed by sir
substitution'
e . ■» g(m4 * k«)
.1, i i .
mi r e"1 (ei} - ki ,
■
/
9- Vigenere with Progressing Key* •
The period of >> Vigenere ean be expanded by ndding n
fixed number t to the key pt e^.ch pppefrance — thus the n^h group
is enciphered by the equ-.tion
ei * mi + ki + nt
Also this can be vnried by adding t and s alternately to the
key, etc.
10. Matrix System**
*
One method of n gram substitution is to operate on
successive n-grams with a matrix having an inverse* The letters
are assumed numbered^ from 0 to 85, making, them elements of an
algebraic ring. From the n-gram m, ou r»* m of message, the
matrix a^j gives an n-gram of cryptogram < .
' n
e, • Z au a, i » 1, *t»,n
1 j=l 1J J
The matrix is the key, and deciphering is performed with
the inverse matrix. The inverse matrix will exist if and only
if the determinant la^. | has an inverse element in the ring.
11. The Playfair Cipher.
This is a particular typp of digram substitution
governed by a mixed 25 letter alphabet written in a 5 x 5
square. (The letter J is often dropped in cryptogrephic work-
it is very infrequent, and when it occurs can be replaced by I.)
Suppose the iey square is as shown below
LZQCP
A 0 N 0 U
RDMIf '?
K Y.S T S '
X B T E W - "•' — - ■
* - '
*See L. S» Hill, "Cryptography in an Algebreic Alphabet,1*
American Math. Monthly, v. 36, No,. 6t 1, 1929, pp. 306-312,*
Also "Concerning Certain Linear Transformation Apparatus of ^
Cryptography," v* 38, No. 3, 1931, pp. 135-154,.
- 3-i -
The substitute for a digram AC, for example, is the pair c
letters at the other corners of the rectangle defined by A
and C, i.e. LO, the L taken first since it is above A. II
digram letters nre on c. horizontal line as RI, one uses th
letters to their right DF; RF becomes DR. If the letters
on a vertical line, the letters below then are used. Thus
becomes UW. If the letters are the same nulls nay be used
separate them or one may be omitted, etc.
12. Multiple Mixed Alphabet Substitution.
In this cipher there are a set of d simple subst
tions which are used in sequence. If the period d is four
ml <m2 *i ffl4 m5 a6 ,,f
. ■• '
becomes
h[ml] f2{m2} f3(cl3) f4(m4) *11b5* f2(m6}
...
13. Autokey Cipher.
A Vigenere type system in vihich either the messr
itself or the resulting cryptogram is used for the "key" i
crlled an eutokey cipher. The encipherment is started wit
a "priming key" (which is the entire key in our sense) and
continued with the message or cryptogram displaced by the
length of the prir4ng key as indicated below with the prin
key COMET, The message used as "key",
MESSAGE . S E N D S U P L I E S ...
KEY -- — - COME 3.8 RiJD S UP
CRYPTOGRAM USZHLMTCOAYH
The Cryptogram us"ed as "key"* ' ;
MESSAGE SENDS UP'P LI E S ♦*"#."'
KEY . ' t O M E t U S 2 B t 0 H »».
CRYPTOGRAM u U3ZHL0 H*e"S TS
- 38 -
14. Fractional Ciphers*
In these, each letter is first enciphered into two
or more letters or numbers and these symbols are somehow mixed
(e.g. by transposition). The result may then be retranslated
into the original alphabet. Thus using a mixed 25 letter
alphabet for the key we may translate letters into two digit
quinary numbers by the table
0 12 3 4
. . 0 L Z Q, C P
1 AG NO V
2 R D M I F
3 K Y H V S
4 X B TEW ,
.-
Thus B becomes 41. After the resulting series of numbers is
transposed in some way they are taken in pairs and translated
back into letters.
15# Codes.
In' codes words (or sometimes syllables) are replaced
by substitute letter groups. Sometimes a cipher of one kind or
another is applied to the result.
*
12 ^ Valuations of Secrecy Systems
There are a number of different criteria that should
be applied in estimating the value of a proposed secrecy system
The more important of these are: '
1. Amount of Secrecy. '
There are some systems that are -perfect — the 'enemy
ls-no better off after intercepting any amount of material than
before* • Other systems, although giving him some information,
do not yield a unique "solution" to intercepted oryptograms* , -
Among the uniquely solvable systems, there are wide variations
in toe amount of labor required to effect this solution; end *
the amount , of material that must, be intercepted to. make the
solution unique, -
- 39- - mJH*H^B£RTE$L
2. Size of Key..
The key must be transmitted by non-interceptible
means from transmitting to receiving ends. Sometimes it must
be memorized. It is desirable then to have the key as small
as possible.
3. Complexity of Enciphering, and Deciphering Operations.
These should, of course, be as simple as possible.
If they are done manually, complexity lends to loss of time,
errors, etc. - If done mechanically,, complexity, leads to large
expensive machines. " " v
4. ; Propagation of Errors.
In certain types of secrecy systems an error of one
letter in enciphering or transmission leads to a large amount
of error , In the deciphered text* The errors are spread out by
the deciphering operation, c fusing the loss of much information
and frequent need for repetition of the cryptogram. It is
naturally desirable to minimize this error expansion..
5. Expansion of Message..
In some types of secrecy systems the size of the
message is increased by the enciphering process. This undesir-
able effect may be seen in systems where one attempts to swamp
out message statistics by the eddition of many nulls, or where
multiple substitutes are used. It also occurs in many "conceal-
ment" types of systems (which are not usually secrecy systems
in the sense of our definition).
15. Equivalence Clesses In the Key Space
It may happen that in a ciphering system two or nnre
different keys, say keys 1,. 2, and 7, are equivalent. -By this
we meen that for every M ~ J
■> ■C^m"-i - . ■ - , . •
, ' ••' •. ; - > ■ — V '
■ . , ' ' ' . , " . ■ Av . ■ ^ ' "■
These keys will not be considered as distinct but will be thrown
into an equivalence class*. It is >clear that the cryptanalyst
oan never determine whioh particular one of these was used but "
only {at test) the class.. The probability for the class is of
course the sam of the probabilities of the different keys in ' :
the class.-
As an exemple, in- the Playfair cipher with the s;
given above, the following are equivalent key squares.
GHXPY X C I 2 T
Z F E C.I JB'Dl.O
LONRD V S <} T A
T A V S Q t W B MK U
K U W B M IP Y GH
We can think of the possible equivalence classes in this c
as arrangements of a 25 letter alphabet on a 5 x 5 square
on an oriented torus. The number of different .keys is not
but 251/52 - 241
• .
" When vie say that two seorecy systems are the sam
mean that they consist of the same set of transformations
with the same message and cryptogram space (range and dome
and the same probabilities for the different keys (after e
identical transformations are put in .the same equivalence
class).
14. The Algebra of Secrecy Systems
If we have two secrecy systems T and R we cen of
combine them in various ways to form a new secrecy system
If T end R heve the same domain (message space) we may for
kind of "weighted sum,"
S ■ p *T ♦ q
where p * q - 1. This operation consists of first making
preliminary choice with probabilities p and q determining
whioh of T end R is used. This cholse is part of the key
After this is determined T or R is used ns originally defi
The total key of S must specify which of T and R is used e
which key of T. (or R) is used* v
■ ,
If T consists of the transformations T^.t 1
with probabilities pv, Pm end R consists o=f R,f ...
Rv with probabilities q,„ qk then S « p T * q R cons
of the transformations Tp, T^ "•— , T , Rr, Rfc wit^
probabilities pp,., ppg, • PPa, qqx» Sfagi • qqk
respectively*
- 41 -
More generally we c^n form the sum of a number
systems.
S = P1T+p2R+... + pmU Sp1 - 1
We note that any system T can be written as a sum of fixed
operations
T " pl Tl + p2 TS + + pm Tm
Tj being a definite enciphering operation of T correspond!:
key choice i, which has probability pf«
A second way of combining two secrecy systems is
taking the "product", shown schematically in Fig. 8. Suppr
T and R are two systems and the domain (language space) of
can be identified with the range (cryptogram space) of R.
we can apply first R to our language and then T to the resi
of this enciphering process. This gives a resultant operat
which we write as a product '
S - T R
The key for S consists of both keys of T and R which are as
ohosen aocording to their original probabilities and indepe
ly. Thus if the m keys of T are chosen with probabilities
pl p2 pm
and the n keys of K have probabilities
pl p2 pn
then S has mn keys (at most; there may and often will be
equivalence classes) with probabilities- p. pl. This type c
product encipherment is often used; for J example one
follows a substitution by a transposition or a transpositic
by a Vigen£re, or applies a code to the text and enoiphers
jte*, result by substitution, transposition, fractionation, etc»
k\ - A more special type of product may be defined in
case both T and R have keys of the 3cme size which may be f
rw in one-to-one correspondence with the same probabilities fc
corresponding keys. This may be called the "inner product,
in oontrast with the above which may be more completely de-
scribed as an "outer product" (these names are derived froir.
a rough analogy with the concepts of tensor analysis). In
the inner product, written
'\ S m T °R
■
- 42 - Q&ffSBEMTtcT
r.nd indicated scheme tically in Fig. 9, the same key (or corr-
spending keys) are used for both T end R chosen with the com
probability*
For exr-nple one nay construct e transposition cip:
whose key is a permutation of the alphabet, each permutation
being equally likely, and apply first this and then a substi"
tion based on the same permutation. One also sees this situ:
tion in certain geometrical types of transposition ciphers
where the text is written into a square and a permutation ba.
on a key word applied first to the columns and then the r
of the square,
* It may be noted that multiplication (either kind)
not in general commutative, (we do not always have BS"SB
although In special cases such as substitution and transposi*
it is. Since it represents an operation it is def initionall;
associative. That is R(ST) - (RS) T * RST,. Furthermore we !
the laws \ ' ' , '
p (p» T+ q' R) + qS * p p' T + p qT R + q S
(weighted associative law for addition)
T(pR+qS)«pTR+qTS
(PR+qS)T-pRT+qST
(right and left hand distributive laws)
and
Pl T + p2 T + ?3 R - (px + P2) T + P3 R
Finally with regard to this algebraic structure of
secrecy operations, we note that every closed secrecy system
has an "inverse" T1 obtained by Interchanging the E end M
spaces, with key probabilities the s*me, and
\T R S)» - S* R» T*
(p T + q R)* - P V ♦ q K*% - ,
' ...<_
Note that T T' is not in generel the -identity (this is the
reason we do not write T**+)» . -<
■■■ y.t: I . . - . . -
A system whose M and E spaces can be identified,
a very common oase as when letter sequences are transformed
into letter sequences, may be termed endomorphic* An endo-
morphic system T may be raised to a power Tn»
- 43 -
A secrecy system T whose outer product with itsel:
is equal to T, i.e. for which
T T ■ T
will be called idempotent. For example simple substitution
transposition of period p, Vigenere of period p (all with e
key equally likely) are idempotent.
The set of all endomorphic secrecy systems deflnec
a fixed message space constitute an "algebraic vrriety," th
is, a kind of algebra, using the operations of addition and
multiplication. In fact, the properties of addition and mu
plication which we have discussed lead to the following res
Theorem 1: The set of endomorphic oiphers with the same
message space and the two combining operations
of weighted addition and ouster multiplication
from a linear associative algebra with- a unit
element, apart from the fact that the
coefficients in a weighted addition must be
non-negative and sum to" unity*
It should be emphasized that these combining oper
tions of addition and multiplication apply to secrecy syste:
as a whole. The product of two systems TR should not be co
fused with the product of the transformations in the system
TjR,, which also appears often in this work. The former T
is a** secrecy system, i.e. a set of transformations with as-
sociated probabilities; the latter is a particular trans-
formation. • Further the sum of two systems p R + q T is a
system — the sum of two transformations is not defined. The
systems T and R may commute without the individual T, and R,
commuting, e.g. if R is a Beaufort system of a given perio
all keys equally likely,
Ri R 3 * RJ Ri'
in general, but of course RR does not depend on its order;
actually ^ • -
' -RR > v -vv-r ' ■■ •
the Vigenere of, the same period with random key* On the oti
hand, if the individual T. and E, of two systems T and R
commute, then the systems commute** " \~ \ -
. i.. .. • > ■ . . • •• -
It is rather surprising to find an algebraic varir
with as much structure as a linear associative algebra in w>
■
- 44 -
•the elements have the complexity of ciphers. In Hilbert space
theory, for example, one has a linear associative algebra,
but the elements of the algebra are transformations. Here the
elements are sets of transformations with a probability space
associated ■ ith the transformation parameter.
These combining operations give us ways of con-
structing many new types of secrecy systems from certain ones,
such as the examples given. We may also use them to describe
the situation facing a cryptanalyst when •attempting to solve a
oryptogram of unknown type. He is, in fact, solving a secrecy
system of. the type
T Px A + pg B * . . . . + Pr S + p* X Z p m 1
where the &f.B»>*t*i s are known types of ciphers, with the p«
their a priori probabilities in this situation, and. pf X
corresponds to the possibility of a completely new unknown type
of cipher*
' In weighted r.ddition the key size of the result is
given by
= p IK.J + q |K2I - (p log p + q log q)
= p Ik-J + q Ik2| ♦ |k3I
i.e. the weighted mean of the two keys plus the size of the
. p, q key* This is only in case there are no equivalences;
if there are it will always be less.
For the outer product the key size is
Ik II 1^ I ♦ |k2I
■•
with -equality only when there are no equivalences. In the
inner product
Ik! < |kx! - Ik2I
with equality under the same condition.
45 -
15. Pure and Mixed Ciphers
Certain types of ciphers, such as the simple sub
stitution, the transposition of a given period, the Vigene
of o given period, the mixed alphabet Vigenere, etc (all
with each key equally likely) have a certain homogeniety v,
respect to key* Whatever the key, the enciphering, deciph
ing and decrypting processes are essentially the same. Thi
may be contrasted with the cipher
PSMT
where S is a simple' substitution and T a transposition of
given period. In this case the entire system changes for
enciphering, deciphering and decryptment, depending on whe
the substitution or transposition was used*
The cause of the homogeniety %a certain ciphers
stems from the ^roup property — we. not! oe ' that in the above
amples of homogeneous ciphers the product of any two trans
formations in the set T, T, is equal to a third transforme
T,. in the set, while T1^1J does not equal any transformat
iB the cipher f
p S + q T
which contains only substitutions and transpositions, no
products.
We might define a "pure" oipher, then, as one wfc
T* formed a group. This, however, would be too restricti-v
since it requires that the E space be the same as the M si
i.e. that the system be end amorphic. The fractional trans
position is as homogeneous as the ordinary transposition v-
out being endomorphic. The proper definition is the folic
A cipher T is pure if for every Tj, Ty Tk there is a Tg s
that
Ti V1 Tk - V .
and every key is equally .likely. ' Otherwise the cipher Is
The systems of Fig. 7 are mixed. Fig- 10 is pure if all k
are equally likely.
r «♦'• - r--- . „i
Theorem 2: In a pure cipher the operations T. T, which
transform the message space into itselT form
group whose order is m, the number of differen
keys.
For
Y1 \ V1 tj " 1
so that e*iCh element has «n inverse, also the assoeiativ
law is true since these are operations, end the group
property follows from
using our assumption that T,-1 T,' - T . • T- for some s.
The operation T^-^T^ means, of course, enciph
the message with key j and then 'deciphering with key i w
brings us back to the message- spa'oe* , If T is endomorphi-
i.e. the T, themselves transform the space 0M into itsel:
is the case with most ciphers, where both the message sp
and the cryptogram space- consist of sequehoes of letters
and the T^' are a group and equally likely, then T is purt
since
■
Ti Y Tk • Ti Tr " Ts •
Theorem 3: The outer product of two pure c,iphers which c
mute is pure.
For if T end R commute ^ R^ - R^ Tm for every i, j with
suitable £, m, and
. . ■ . -
The commutation condition is not necessery, however, for
product to be a pure cipher* '
A system with only one key* a single defini
operation T^, is pure, since the only 'choice of Indices is
Tl Tl"1 Tl * Tl*
Thus the expansion of a general cipher into a sum of such
simple transformations also '.exhibits it as ft sum of pure
ciphers.
An examination of the example of a pure cipher
shown in Fig. 5 discloses certain properties. The message
fall into certein subsets which we will cell residue clas;
and the possible cryptograms are divided into correspond!:
residue classes. There is at least one line from er.ch mes
sage in a class to each cryptogram in the corresponding cl
and no line between classes which do not correspond. The
number of messages in a class is a divisor of the total
number of keys. The number of lines "in parallel" from a
message M to a cryptogram in the corresponding class is ec
to the number of keys divided by the number of messages ir
the class containing the message (or cryptogram)* It is s
in the appendix th?t these hold in generel for pure cipher
Summarized in a more formal statement we neve /
Theorem 4: In a pure system the messages can be divided i
a. set of "residue classes" C., C2, C„ and
the cryptograms into a corresponding set of
residue classes C' C' . .., C' with the folic
properties
The message residue classes are mutually
exclusive end collectively contain all
possible messages.. Similarly for the
cryptogrc-.ni residue classes.
Enciphering *ny message in C, with any ke
produces a cryptogram in CI. Decipherir.
any cryptogram in C! with any key leads
to a message in C^t
The number of messages in C. , say <p. , is
equal to the number of cryptograms
in C£ and is a 'divisor of k the number
of keys.
Each mrssnge in can be enciphered into
erch cryptogram in Ci by exactly. JL
different keys. Conversely qp. .
for decipherment. 4
(1)
(2)
(3)
(4)
- 48
The importance of the concept of a pure cipher
the reason for the nane) lies in the fact that for them &
keys are essentially the same. Whatever key is used for
& particulsr message, the a posteriori probabilities of a
messages are identical* To see this, note that two diffe
keys applied to the same message lead to two cryp-tcgrams
the same residue class, say Cj » The two cryptograms ther
fore could each be deciphered by — keys into each mes.<
9i
in C. and into no other possible messages. All keys be in,
equally likely the a posteriori probabilities of various
messages are thus
pbim) - hp a&ai _mi
E P{M) PM{E) "
where M is in C,, E is in CI and the sum is over all mess-
in C, .. If E and M are not In corresponding residue classe
Pg(Mr - 0/ Similarly it can be shown that the a posterio:
probabilities of the different keys are the same in value
these values ere associated with different keys when a di?
ent key is used. The same set of values of PE(K) have un<
gone a permute t ion among the keys. Thus we haVe the resul
. Theorem 5: In a pure system the a posteriori probability
of various messeges P~(MJ are independent of t
key that is chosen* The a posteriori prob;
bilities of the keys PE(K) are the same in vai
but undergo a permutation with a different ke\
choice.
Roughly we may say that any key choice leads tc
the sf.me cryptanalytic problem in a pure cipher. Since tfc
different keys all result in cryptograms in the same resid
class this means that all cryptograms in the same residue
class nre cryptanalytically equivalent — they lead to the s
a posteriori probabilities of messages and, epart from a
permutr.tion, the same probabilities of keys.
As an example of this, simple substitution wit:
all keys equally likely is e pure cipher- The residue cle
corresponding to a giTen cryptogram E is the set of all
Cryptograms that may be obtained from E by ope'rstions T < T
In this case T . Tk~l is itself' a substitution and henoe an.
substitution oil E gives another member of the same residue
class.. Thus if the cryptogram is
49
' |'|| | I ■
E'ICPPGCf d
then
E1»RDHHGDSN
Eg»ABCCDBEF
etc. ore in the same residue class. It is obvious in this
case, that these cryptograms are essentially equivalent.
AIT that is of importance in a simple substitution with
random key is the pattern of letter repetitions, the actur
letters being dummy variables * , Indeed vie might dispense
with them entirely indicating the pattern of repetitions
in E as follows:* -
This notation describes the residue class but eliminates e
information as to the specific member of the class* Thus
leaves precisely that information which is cryptanalytical
pertinent. This is related to one method of attacking sic
substitution ciphers — the method of pattern words.
In the Caesar type cipher only the first difft
ences mod 26 of the cryptogram are significant. Two crypt
grams with the sane Ae, are in the same residue class. Or.
breaks this cipher by the simple process of writing down t
26 members of the message residue class and picking out th
one which makes sense.
The Vigenere of period d with rpndom key is a'r.
example of a pure cipher. Here the message residue class
consists of all sequences with the same first differences
letters separated by distance d as the cryptogram. For
d m 3 the residue class is defined by
ml " m4 " el ~ e4
m2 m5 " e2 " e5
~ n6 e5 " 66 r
m4 ' "7 " 64 "e7(
|
1
^Suggested by a notation used by Quine in Symbolic Logic*
- 50 -
where E - e^, e0, ... is the cryptogram and m^, m^, ... is any
M in the corresponding residue class.
In the transposition cipher of period d with random
key, the residue class consists of all arrangements of the e.
in which no e, is moved out of its block of length d, and any
two e. at a distance d remain at this distance. This is used
in brisking these ciphers as follows. The cryptogram is written
in successive blocks of length d, one under another as belo-w
(d «= 5):
el
e2
e3
4
e5
e6
e7
e8
e10
ell
e12
•
•
•
•
•
•
*
»
The columns are then cut apart and ^rearranged to make sense.
When the columns are cut apart, the only information remaining
is the residue class of the cryptogram.
Theorem 6: If T is pure then Tj_ T* T « T where '
Ti Tj are eny tv,° tronsform'' 'tions of T. J Conversely if
this is true for any Tj in a system T then T is pure.
The first part of this theorem is obvious from the
definition of a pure system. To prove the second part we note
first that if T, T."1 T * T then T, T.-l T is a transforma-
l j 1 j s
tion of T. It remains to show thpt all keys are equiprob^ble .
We have T - E P T and
s
s *s i j s s *s s
the term in the left hand sum with s • j yields
The only term in Tj on the right is Since all co-
efficients rrc non negative it follows that
x
The same argument holds with i and $ interchanged and
consequently
pj c Pl
and T is pure. Thus the condition th^t T, T.-1 T - T might
be used ~s an - lti.rn- tive definition of a J pure system.
- 51 -
The property of purity in e system is connected vtit.v
idempotence. Thus consider the system S ■ T T' where T is
pure. We have
Ti Tj"1 Ts V1 ' Ti V1 Tr V1 " Ti V1
so th"t the transformations of S are the same ~s those of S,
■and since both S and S are pure we hrve
S - S2
Theorem 7: If T is pure S » T I' is pure and S2 * S.
An endomorphic system T which satisfies the conditi'
Ti Tj * Ts ^but not necessrrily with all key probabilities
equal) can be shown to approach a pure cipher on raising to a
high power, namely the one with the same trensf ormr-tions , but
with all probabilities equalized.. In fact the probabilities
for Tn+1 are derived from those for T^ by a Markoff process,
of a special type due to the. group property* This special
type always approaches the limit of equalized probabilities.
This seme argument applies more generally.' We have
Theorem 8: Let T be any endomorphic cipher. If T11 approaches
any limit at ^11, which will necessarily occur if
all the transformations of Tn lie in a finite set
(no matter how large n) and the transf arffln tions of
T include the identity then this limit will be r
pure cipher.
As m example consider the cipher
R = p T + q S
where T is transposition with random key and S substitution
with random key. We have
S2 = S
T
ST ■ T S
-
and hence any product of T* s and S?s suoh asTST-TTSS
reduces to S T. Thus
Rn - pn T + qn S + (1 - pn 1 qD) S T
- 52 -
Ls n 10 the first two terms approach zero find
Lin Rn » S T
n -*• xi
The concepts of pure ^nd mixed lnngu-.gts nnd. pu
and mixed ciphers have an application in practical cryptana
ysis, if we interpret them somewhat loosely. When a crypt-1
grapher starts work on a cryptogram, his first job is to de
termine the original language. Approximately then he is de
termining the pure component of the general language space
L > px Lx + p2 Lz + ... ♦ pn Ln
where say is English, L£ German, etc. Of course these e
not pure but the different components of them are fairly cl
together in statistical structure.
The second thing a cryptographer d~>es is to de
termine the "type" of cipher that was used — usually this is
about the same as finding the pure component in the general
cipher system
R • Px S + p2 T + p3 Y + ...
where 3 say is simple substitution, T is transposition, etc
A Vigenere V of unknown period is not a pure cipher but the
decomposition
V * Pi Vl + P2V2 + *3 V3 + —
where V, is of period i, is into puro components (if all ke
are equally likely for any period). In solving e Vigenere
the first problem is to determine the period. The same is
true in transposition.
The reason for this initial isolation of pure
«of neerly pure language and cipher is that only then or.n a
simple meaningful stntistical analysis be carried out.
—
16. Involutory Systems
If every trsnsf orrar: tioh in n systen T is its y.
inverse, i.e. If
Ti Ti - 1
for every i, the system will be called involutory. Such
systems are important prrcticrlly since the enciphering r
deciphering operations -re then identical. This l«vds t*
sinplifiod instructions to cryptographic clerks in manual
oper^ti^n, or in mechanical cases the sane machine with t
sane key setting nay be usee" for bath ~perctions.
Examples: In simple substitution we nay limit our trans-
formations to those in which when letter 9 is
the substitute for <p, 9 is the substitute for
.toother example is the Beaufort cipher-
If T is involutory, so is the system whose ope
tions are :^-.;>r :
■ - . * ' . •" ■ .*• 1
SS Ti si
\ - ,*
since ■ ; .
17. Similar rnf Weekly Similar Systems
Two secrecy systems R and S will be s-^id to b<
similar if there exists ' transf orn- tion /. having en. invc
A- J- such th^t
r
R ■ A S
This means thrt enciphering with R is the same ps enciphe
with S ' n.Q then 0 per- ting on the result with the transf or
tion A. If wo write Rw S to mean R is similar to S then
is clear thrt R»S implies S^R, Also R« S pnd S» T impl
R~T and finally R~R. These are sun-prized in mathenati
terminology by spying that similarity is an equivalence
relation. * * '/ *
The cryptographic significance of similarity i.
if R~S then R and S are equivalent from the cryptanaly
point of view. Indeed if a cryptanalyst intercepts a cry
gram in systemNS he can transform it to one in system R b;
merely applying the transformation A to it# /. cryptogram
system R is transformed to one in S by applying vArlf If :
and S ar6 applied to the same language or message space,
there is f one-to-one correspondence between the rc-sultin
cryptograms. Corresponding. cryptograms give the same dis
tribution of r posteriori probabilities for all messages.
If ~ne hrs r art|p3 of broking the system R the:
any system S similar to R en be broken by reducing to R
through application if the -perrti^n A.' This is r device
thct is frequently used in pr^ctic~l cryptrn" lysis .
Examples: As r trivial cx^mjle, simple substitution v.herc
the substitutes ^re n^t letters but ^rbitr^ry
symbols is similar t? simple substitution using
letter substitutes. A second exrmple is the
Cresar rnd the reversed C^es^r type ciphers.
The letter is sometimes broken by first trans-
forming into a Cresar type. The V-igenere,
Beaufort rn? Variant Beaufort are p11 similar,
•when the key is random. The "autokey" cipher
primed with the key K, Kg ... K, is similar to •
Vigenere type with the key .'alternately added an'
subtracted Lod 86» The %tf nsformrtion A. in this
case is th^t of "deciphering" the. autokey with
. a series of d A*s for the priming key.-. -
* '•-•.'■». .■■>:. .v....
Tv,- systems R fn? S are weakly similar if there
exist two transformations A an<* B having inverse A'l end
B-l with
R - A S B
This me^ns ttrt system R is the same ~s applying first B
t^ the language, then S, mc1 finally A. This rcl^tim is
rlso nn equivalence relation.
Finding a method of solution f-^r system R with
lrngunge L is equivalent t^ finding a solution for S with
language B L. ■
We may note that if R is pure an' S is weekly
similar t' R then S is pure. This follows from
R.i Rj-1 Rk - Rt
■ A Si B
Kfl « B--1 Sj1 A"1
\ - A sk B v/
where we assume corresponding transformations in R on" S
t-i h~ve the srme subscripts. Hence
- 55 -
-i
-1
R. R - * R. - A S, S. S. B " R
i °j
.r1 r^ b"1
3j
anc S is therefore pure*
* - t
t •. .
PART II
Theoretical Secrecy
Introduction
We now consider problems connected with the "theorecti-
cal secrecy" of a system. How immune is a system to cryptanaly-
sis when the eryptanalyst has unlimited time and manpower avail-
able for the analysis of cryptograms? Does a cryptogram have a
unique solution (even though it may require an impractical amount
of work to find It) and if not how many reasonable solutions does
it have? How much text in a given system must be intercepted be-
fore the solution becomes unique? Are there systems which never
become- unique in solution no matter how much- enciphered text is
Intercepted? Are there systems for which no Information whatever
is given to the enemy no matter how much text is intercepted?
18 Perfect Secrecy
Let us suppose the possible messages are finite in
number Mi..* Mn and have a priori probabilities P{Mi),...,
P(Mn), and that these are enciphered into the possible crypto-
grams Ei ,..Em by
E - Ti M .
The eryptanalyst intercepts a particular E and can
then calculate the a posteriori probabilities for the various
messages, Pe(M) • IT is natural to define perfect secrecy by
the oondition that for all E, the a_ posteriori probabilities are
equal to the a priori probabilities independently of the .values
of these, In~~tnis case, intercepting the message has given the
eryptanalyst no information** Any action of his whioh depends
on the Information contained in the cryptogram cannot be altered,
for all of his probabilities as to what the cryptogram contains
remain unchanged*- f On the other hand, if the condition Is not
satisfied there will exist situations' in which the enemy has cer-
tain a_ priori probabilities, and certain key snd messages are
chosen where the enemy^ probabilities do .change* This in turn
may effect his actions and thus perfect secrecy -has not been . . ,
— «•.' *» ^ «• «• — «► «• — -* a» _ ■» f •» — a» . a* •»
*A purist might object that the enemy has obtained a bit of infor-
mation in that he knows a messsge was sent. This may be answered
bykJhaving among the messages a "blank" corresponding to "no mes-
sage tfl If no message is originated the blank is enciphered and
sent as a cryptogram,, Then even this modicum of remaining infor-
mation is eliminated,
obtained. Hence the definition given is necessarily required by
our ideas of what perfect secrecy should mean.
A necessary and sufficient condition for perfect sec-
recy can be found as follows.- We have by Bayes' theorem
t> P(M) ^(E)
P-r M - ■
* P(E)
> ■
and this must equal P(M) for perfect secrecy, Hence either
P(M) * 0, a solution that must be excluded since we demand the
equality independent of the values of P(M) , or ; ;
- ' ) ; -,p(e) . ■
for every M and E» Conversely if ^(E) - P(E) then
and we have perfect secrecy* Thus we have the result:
■ .
Theorem- 9; A necessary and sufficient condition for
perfect secrecy is that
-
PM(E) - P(E)
for' all M and E. That is Pjj(E) must be
independent of K,
The probability of all keys that transform M« into a given crypto-
gram E is equal to that of all keys transforming if* into the
same E.
Now there must be as many E's as there are MTs, since
fixing i, Tj gives a one-to-one correspondence between all the
MTs and some of the E»s . For perfect secrecy Pvr(E) « P(E) ^ 0
for any of these E»s and any M. ■ Hence there is at least -one key
transforming any M into any of these E*e, But all the keys from
a fixed M:to different E's must be different, and therefore the'
number of different keys, is at least as great as the number of
M»s* It is' possible to obtain' perfect, secrecy with no more, »s
one shows by the following example* . I,et the be numbered 1 to
n and. the E^ the same > and using n keys let
_ - ^ ■* >:?:**,:■ <■ * *f 'f'*t'%«.. .: . ■ . •' •' rj**?* ' ' -
where s ■ i +>j (Mod nj . • In this^case we see that P~(M) » — » P<E)
and we have perfect secrecy.' An example is shown
with n « 5. •
- 58 - ooaam^mj
These perfect systems in which the number of crypt
grams, the number of messages r and the number of keys are al
equal are characterized by the properties that (1) each M is
connected to each E by exactly one line, (2) all keys are eq
likely. Thus the three matrix representations of the system
"latin squares".
We have then concealed completely an amount of inf
tion at most log n with a size of key log n. This is the fi
example of a general principle which we will often see, that
there is a limit to what can obtain with a given key size— t
amount of uncertainty we can introduce into the solution of
cryptogram cannot be greater than the key size* Here we hav
concealed all the information but the ke*y size is as large a
message space* .
We now consider the case where lM| is infinite; in
suppose the message generated as an unending sequence of let
by a Markoff process* The maximum rate of this source is Rc
It is clear from our results above that no finite key will g
perfect secrecy. We suppose then that the key source genere
key also in the same manner, i.e. as an infinite sequence or
bols with a mean rate RK. Suppose that only a certain lengt
key Ljc is needed" to encipher and decipher a length of mes
Theorem 10: For perfect secrecy (when the a priori proba-
bilities of various messages can be anything) ,
for large L
Ro LM < %
and the rate (RR * e) is asymptotically
sufficient.
This may be provSd by the same method (essentially
the finite case. This case is realized by the Vernam systet
These results have been deduced on the basis of un
or arbitrary a. priori probabilities for the messages* The k
required for perfect secrecy depends then on the total numbe
possible me s sages j 6? on the maximum rate Bo °f the' message
source. * - •'.
" ~* ' - one would suspect that if the message space has fi
known statistics; so that it has a definite mean rate R of
generating information, th<3n the amount of key needed could
reduced in an average sense in just this ratio JL» end this
Ro
indeed true. In fact the message can be passed through a ti
ducer which transforms it into a normal form and reduces the
- 59 -
expected length in just this ratio, and then a Vernem syst-
may be applied to the result. Evidently the amount of key
per letter of message is statistically reduced, by a factor
R
— and in this case tho key source and information source
H0
just matched--an alternative of key conceals an alternativ
information. It is easily seen also, by the methods used :
"Information* paper that this is the best that can be done.
K Theorem 11; 'Perfect secrecy (omitting the condition of
independence of a_ priori probabilities) for
. a source with fixed statistics and a, rate
R of generating Information can be' 'achieved
with a key source which generates at the
rate (R + e) where W and Lv are message
„ • - _ «• **
LK
and key lengths^ which correspond. ;A rate
less than R iM. is insufficient.:
% ' -
Perfect secrecy systems have a place in the prac-
picture — they may be used either where the greatest import
is attached to complete secrecy — e.g. correspondence betwe.
the highest levels of command, or in cases where the numbe:
possible messages is small. Thus, to take an extreme exam;
if only two messages "yes" or "non were anticipated a perft
•system would be in order, with perhaps the transformation -
K
M
A
B
yes
- 0
1
no
1
0
The disadvantage of perfect systems for large co:
pondence systems is,' of course, the equivalent amount of ke
that must be sent. In succeeding sections we consider what
be achieved with smaller key size, in particular with fini-
keys,
19. Equivocation
Let us suppose that's simple substitution' cipher
been used on English text and that we Intercept a certain t
N letters, of the enciphered text. For N fairly large, mo:
than say 50 letters, there is nearly always a unique solut:
the cipher; i.e. a single good English sequence which tram
- 60 - SpjffffifflffiCI&Li
into the intercepted materiel by a simple substitution. W:
smaller N, however, the chance of more than one solution is
greater; with N * 15 there will generally be quite a numbe:
possible fragments of text that would fit, while with N = E
good frecteon (of the order of 1/8) of all reasonable Engl:
sequences of that length are possible, since there is seldc
more than one repeated letter in the 8. With N «* 1 any let
is clearly possible and has the same a posteriori probabili
as Its a priori probability,. For one^letter the system is
feet, ~
This happens generally with solvable ciphers. Be
any material is intercepted we can imagine the a^ priori pre
bill ties attached to the various possible messages, and a Is
to the various keys. As material Ik Intercepted, the crypt
lyst calculates the a posteriori probabilities; and as N ir
the probabilities *>f*""certa in messages • increase * and of most
decrease, until finally only one is left ^ which has a probe
nearly one, while the total probability of all others is ne
zero, - : r.
This calculation can ectually be carried out for
simple systems. Table 1 shows the a .posteriori probabiliti
for a Caesar type cipher applied to English text, with, the
chosen at random from the 26 possibilities. To enable the
of standard letter digram and trigram frequency tables the
has been started at a random point (by opening e book and p
a pencil down at random on the page). The messege selectee
this way begins "creases to • , ," starting inside the wore
creases. If the message were to start with the beginning c
sentence a different set of probabilities must be used, cor
ponding to the frequencies of letters, digram , etc,, at t
beginning of sentences, ./.„.■
The Caesar with random key is a pure cipher and t
particular key chosen does not affect the a posteriori prot
bilitles; To determine these we need mereTy list the possi
decipherments by all keys and calculate their a priori prob
bilitles* The a posteriori probabilities are Ehese divided
their sum; These possible decipherments are found by the
standard process of "running down the alphabet" from the me
and are listed at the left* These form the residue olass f
the message. For one intercepted letter the a posteriori p
bilitles ere equal to the a_ priori probabilltres for letter,
are shown in the' column- headed Nf s 1, For two intercepted
letters the probabilities are those for digram adjusted t
sum to unity and these are shown in the column N * E.
- 6i - aaffflft
Table 1
A Posteriori Probabilities for a Caesar Type Cryptogr
Decipherments
N = 1
N - 2
N - 3
N - 4
CREAS
• 032
.015
•111
.55
DSJBT
, .036
.068
ETGCU
,123
.170
/ •
F U H D V
, .023
,023
G V I E W
. .016
«■
H W J F X
,051
- .015,
•
I X K G Y
,072
t-i
JYLHZ '
.001
K Z M I A
. .005
L A N J B
. .040
. ,072
. .250
.01
MBOKC
, .020
.019
. .022
. *.oi
N C P L D
. ,072
4 ,066
0 D % M E
. .079
V .034
P E R N F
, ,,023
, .085
. #438
a n
. -#43
Q F S 0 G
. „002
RGTPH
. .060
.013
SHUQI
• .066
.064
. .005
T I V R J
.096
.272
.166
U J W S K
. .030
V K X T L
. .009
W L Y U M
. .020
.008
.005
X M Z V N
.002
Y.N A WO
.019
.006
Z 0 B X P
.001
A P C Y Q
.080
. .066
B Q D Z R
.016
Q, (digits)
-1.248
#999
. .602
.340
Trigram frequencies have also been tabulated and .these are
in column N *.3. For four and five letter sequences probe
, ties were obtained by multiplication from trigram t re quenc
since approximately " ,\ '.. Vv^w.-'--
•v- •
p{ijki) --p(tjk) PJk(^)
■ **- ■ -> . --. ■
t
- 62
rriUlTTWiTTTi'iT
Note that at three letters the field has narrowe
to four messages of fairly high probability, the others bei
snail in comparison. At four there are two possibilities
five just one, the correct decipherment.
In principle this could be carried out with any
but unless the key is very small the number °f jg""^
so large that the work involved prohibits the actual caicu
This set of a posteriori probabilities describes
the cryptanelyst's knowledge of the message and key g re due
becomesPmore precise as enciphered material is obtained
description, however; is much too involved and difficult t
obtain for our purposes. What is desired is a simplified
caption of this approach to uniqueness of the possible sc
We will first define a -quantity Q called the "ec
vocation" which measures in an average way ^.^J*8"*;
the solution, or How far it is from unicity. Suppose tha;
celtl in cryptogram E ,of N letters has been intercepted. .
c?yptaSa^st III in principle calculate the a posteriori ,
Mlities by the use of Bayes' theorem..- Thus
P^M) « P(M) PM(E)/P(E)
Similarly the probabilities for various keys, after E has
intercepted are given by
P2(K) - PlK) Pk(E)/?(E)
The equivocation of the message should measure
way how -spread out these probabilities PE(M) are; how far
are from being concentrated at one message. In Xio* with
General principles of measuring such dispersion, as in th
Srhnioe uncertainty, and generating Information, we de
He Equivocation or tU messfge when E has been intercept
... ■ ■■ .......
•v^-v^-. , ■ ^(M) m j. pg(M) log' Pe(M)
M
the summation being over ell P05*1*1^*3 !f ^ven^1*1"1
equivocation in key when E in intercepted Is given *y
q(K) - - T PE(K) log Pe(K)
K
The same general arguments used to justify our me
of information rate may be used here, to justify the equivc
measure. We note that equivocation zero requires that one
sage (or key) have probability one, all others zero. Equi\
is measured in the same units as information, i.e. alterna'
digits, etc., according as the logarithmic base is 2, 10, c
In fact, equivocation is almost identical with information,
difference being one of point of view. In information we £
the notion of how much freedom we have in choosing one eler
from a set with certain probabilities — in equivocation we t
size the uncertainty of our knowledge of what wss chosen wt
probabilities have certain values.
Although any one number can hardly be expected tc
cribe the set PE(M) perfectly for all purposes, I think the
defined here does as well as any single statistic can* Sor.
the theorems which follow indicate the mathematical "naturt
of this particular measure.
.
The values of equivocation for the Caesar type c:
gram considered above have been calculated and are given ir
last row of Table 1. This is the Q, for both key and messaf
the two being equal in this case.
The definitions given above involve 'a particular
cepted E, and ore the equivocations for that intercepted c:
gram. We wish, however, to find a measure of the equivocf
for the system as a whole, which will describe this progre:
toward uniqueness as N increases in an average sort of way.
To do this we form a weighted average of the equivocations
each particular intercepted message E, weighting in accord;
with the probabilities of getting the E in question. This
be called the mean equivocation of the system, or where ttu
is no chance of confusion with the narrower equivocation fc
particular E, we abbreviate to merely the equivocation. T:
mean equivocation of message is
Q(M) - - T P(E) Pe(M) log Pe(M)
/ M,E
v
the summation being over all M and all E. Since
P(E) Pg(M) - P(E, M)
the probability of getting both E and M, we can write this
PM(E)
Q(M) - - T P(M,E) log PE(M) - - 2 P(M,E) log P(M)
P(E)
- 64 - tuiiiii 1 1 milium m
Similarly
Q(K) - - Z P(K,E) log P(K) -f— .
Either of these mean equivocations is a theoretics
measure of the secrecy value of the system. We ssy theoreti
since even when the equivocation is zero, which corresponds
no uncertainty as to the message , it may require. e tremendou.
amount of labor to locate the particular message where the p
bility is one. It might, for example, be necessary to try e
possible K in succession until one was found that trensforme
the intercepted E into reasonable text in the language. Thu
system would be practically very good, but theoretically sol
The equivocation may be said to measure the degree of secrec
when the cryptanalyst has unlimited time and energy.
The equivocation is, of course, a function of N, t
number of letters intercepted. The functions Q(K,N) and Q,(M
will be called the equivocation characteristic* of the syste.
Th3 following data will be helpful in forming a pi
of what small values of equivocation represent.
An equivocation of .1 alternative would result if
9 times in 10 there was no uncertainty as to M, the tenth ti:
two M*s were equally probable, or (2) if every time there we
two possibilities one with probability .983, the other with
probability .017, or (3) if 99 times in 100 there W3S no unc
tainty, the 100th tine 1000 equally likely possibilities.
An equivocation of ,01 would result <1) if every t
there were two possibilities one with probability .999, the
with probability .001, or (2) if 99 times in 100 there is no
certainty, the other time two equally likely possibilities, ;
(3) if 999 times in 1000 there is no uncertainty, the other t:
6 or 7 equally likely possibilities*
* ■ v -.■■-*
- - '* x
20, Properties of ^Equivocation
Equivocation may be shown to have a number of inte:
esting properties* most of which fit Into our intuitive pict
of how such a quantity should behave* We may first show, by
example, the somewhat surprising fact, that after a cryptena.
has intercepted certain special- 'E*a, his equivocation as to !
or message may be greater then before he intercepted anythin,
The Intercepted material has increased his ignorance of what
happenedl Suppose there are only two messages and Mg wit;
a priori probabilities p end qf and that a simple substituti
65
is used according to the following table, the two keys K± and K2
also having the e_ priori probabilities- p and q.
Kl
K2
E2
El
M2
E2
Before the interception, the equivocation of both key and message
is - (p log p ♦ q log q), which is less than one alternative if
p 4 q. If p » q there is little uncertainty as to which message
and key will be chosen, Mi and Now suppose he intercepts
The a posteriori probabilities of both keys and both messages are
easiTy seen to be l/Z. and hence the equivocation for both key
and message is one alternative, greater than before.' On the other
hand, if Eg is intercepted, the more probable event, the equivo-
cation for both key and message decreases, more than enough to
compensate for the other increase, and the mean equivocation of
both key and message decreases. This is a general property of all
secrecy systems.
The mean equivocation of key, Qk(n) iB a non-increas-
ing function of N. The mean equivocation of the
first A letters of the message is a non-increasing
function of the number N which have been intercepted.
If N letters have been intercepted, the equivocation
of the first N letters of message is less than or
equal to that of the key. These may be written
Theorem 12:
Qm(m) < Qm(N)
Qu(N) <
S > N
M > N
The qualification regarding A letters in the second
result of the theorem is so that the equivocation will not be
calculated with respect to the amount of message that has been
intercepted^ If it iB; the message equivocation may lend usually
does) increase for a timej due merely to the fact that more
letters stand for a larger possible range of messages* The
results of the theorem are what we might hope from a good measure
of equivocation, since we would hardly expect to be worse off on
the average after intercepting material than before-. The fact
that they can be proved gives additional justification to our
definition*
- 66 -
The results of this theorem can be proved by a sub-
stitution in the property 6 of section 1» Thus to prove the
first or second we have for any chance events A and B
Q,(B) > QA(B)
If we identify B with the key (knowing the first S letters of
cryptogram) and A with the remaining N - S letters we obtain
the first result. Similarly identifying B with the message
gives the second result. The last result follows from
Q(M) < Q(K) * Qg(M) . \
and the fact that QK(M) * 0 since K uniquely determines M.
Theorem 13: Q,(K) - JM| ~ }E| + jK|
Q(M) « fM | - |E|.+ |Hf
where
- - I P(M,E) log .
M,E
We have
q(k) - - r
E,K
P(K) PK(E)
P(E)
Hence
'Q(K) - - 2 P(K) PK(E) log P{K) - r P(K) Pk(E) log, PKfE)
, + r P(K) PKiE) log P(E)
Summing the first term on E gives - 1 P{K) log P(K) ~
In the second term PviE) is P(M)t the unique M that gives E
with key K. Summing on K then gives - T P(M) log P(M) - |M|.
The third term is 2 P(E) log P(E) - |EU
- 67 -
«iJ!JlfiuJlL 1
The. second equation in the theorem is proved by the
same method.
Q(M) - - Z P(E) Pe(M) log Pe(M)
- - I ?(«) *(» log F(M)
P(EJ
« - Z ?(M) FM(E) log P(M) - Z P(K) Pm(E). log PM(E)
' + Z P(M) PM(E) log P(E) ':
- |M| - |S| - T P(M) PM(E) log, Pm(EJ '
The last term here aay.be interpreted as follows* Group to-
gether 811 the different keys that transform a fixed M into
the same E, giving the total probability to the group, which -v.
will be %(E) . The last term is the average size of this group
space weighted according to the probability P(M) of choosing
among the groups leading out of M. In case no group contains
more than one element (at any rate no group from a M with
P(M) > 0 then |H| * |K| and q(K) - Q,(M) . This is also clear
since there is then a one-to-one correspondence between the
keys and messages for any given E.
From the first equation of the theorem we may conclude
that Q(K) - |K| in case |M| - fEj . This latter occurs in par-
ticular if all L''s ere equally likely and all E»s equally likely
and there are the Same number of each. It is easy to see that
this is the case with a language in which every letter is equally
likely and independent, ond when almost any of the simple ciphers
are used.
If we have a product system S s T R, it is to be ex-
pected that the second enciphering process does not decrease
the equivocation of message and thiq Is actually/true as C8n
be shown by the methods used /above* If T end R commute either
may be considered as being the first and hence in this" case .
the equivocation with S is not less than the' maximum for the,
two systems R and T, Simple examples' show that this does not '
hold necessarily if R and T do" not commute, \\
Theorem 14; The equivocation in message of a product
system S » T R is not less than that when
only R is used. If T R - R T it is not less
than the maximum of those for R and T alone.
68 -
If we hove a product of several systems R S T U, we
con of course extend this, to sey that the equivocation of
R S T U is not less than that of S T U, which is not less than
that for T U, etc
There is no similar theorer.: for the inner product since
for example if T and R are inverse processes their inner product
is the identity and the resulting equivocation zero.
Suppose we have a system T which can be written as a
weighted sum of several systems R, S, U
T - pxR + PgS + ♦ + PmU I Pi - 1
1 .\- - ■
and that systems R, S, U have equivocation characteristics
Qi, Qe %l* • . ' ■ ;' '
Theorem 15: The equivocation Q of a weighted sum of
systems is bounded by the inequalities
2 PiQi < Q < 2 PiQi - I Pi log Pi
These are best limits possible. The Q»s may refer either to
key or to message, .
The upper limit is achieved, for example, in strongly
ideal systems (to be described later) where the decomposition
is into the simple transformations of the system. The lower
limit is achieved if ell the systems R, S, ..t) U go to com-
pletely different cryptogram spaces. This theorem is also proved
by the general inequalities governing equivocation,
QA(B) < Q(B) < Q(A) ♦ QA(B).
We Identify A with the particular system being used and B with
the key or message, •
There Is a similar theorem for weighted sums of
languages, ■ v "■
Theorem 16: Suppose a system can be applied to lenguages
• , ••* ^i# L2». •♦•> Lm Qn<l has equivocation cha,rac-
, teristics Q^.* Q-2» ^m* When °PPlied t0
the weighted sum ? Pi Li, the equivocation Q,
is bounded by
2 Pi Qi £ Q £ 1 Pi^i " 1 Pi log pi
- 69 -
These limits are the best possible end the equivocations i
question can be either for key or message.
The proof here is essentially the 'same as for th
preceding case.
An important consequence of the result
Q(K) « iKf + |Ml - JE|
is the following,'
, . ..«'. *~ •
Theorem 17;* In any closed system, or any system where
-. <. " the total number of possible cryptograms is
. ' ; equal, to the number of possible messages"
• of N letters Q(K) > \K] - < fM0 1 - }M|) •* |K] •
'L v * i " : where M0 » log H, with H the number of pos-
- - , ' :: ■>-.■.'•'.;-. sible messages of N letters." Dm is the total
redundancy for N letters,'
This is true since |M0 | > [Ef, the equality hold
only if all cryptograms are equally likely.1 The theorem s
that in a closed system the key is determined only by the
dundancy of the language - the equivocation can decrease o
es the redundancy comes into action and at no greater rate
Suppose we have c pure system and let the differ
residue clesses of nassoges be Ci., C%r Cr, The co
ponding set of residue classes of cryptograms is C^,..
The probability of each E in is the sane: ;
' Where is the number' of different messages in Thus ;
: , - «-z p(Ci) log' - '
P(E) « 2i££i E e C,
70 -
Substituting in our equation for Q, we obtain:
Theorem 18: For a pure cipher
Q - \K\ + (Hj ♦ I P(Ci) log
This result can be used to compute Q, in many cases of inte
From the analytic point of view pure ciphers hcv
simple structure. If a cryptogram is intercepted its resi
class gives the complete information obtained by the crypt
Within the residue class the system is perfect - each mess
in the class has an a posteriori probability equal to its
a priori probability? For large N. beyond the unicity poi
There will usually only be one M in the class of reasonabl
probability., and the -problem is to determine this M.
The theorem oh equivocation of pure' ciphers can :
altered to show this. We have
iptCi) log ZllLL « z p(ci) log p(ci) -i p(Ci) log ^-
<?i V1
+ Z ViCi) log k
- Z PtCiJ log P(Ci) + QM(K) - |K|
Hence
end
P(C< )
Q (K) - |K| + |M| + Z P{C, ) log i-
" |*| ♦ QM(K) + I P(Ci) log P(Ci)
Q <M) '■' - |M| - [-Z P(Ct) log HCil 1
The equivocation of message is the equivocation of message
the cryptogram was intercepted less the information imparte
specification of its residue class, ; . * " : ■
SI. Key Appearance Characteristic
Suppose the cryptanalyst has N letters of message
and N letters of the equivalent cryptogram. Then he can ca3
cul.ate the a posteriori probabilities of the various keys or
the basis of this information, and if N is small there will
remain a certain equivocation of key* For example in simple
substitution, knowing 20 letters of message and cryptogram
does not disclose the entire key, since only about 12 letter
of the 26 will be represented, • Thus there is a residual
equivocation of log (26-12);, if exactly 12 letters appear.
We define the mean residual key equivocation as
*••
. , / : . •• „•• ; ,r;-:"
when P(E,M) is the a priori probability of having message M
and cryptogram E, and Pg^fK) is the conditional probability
of K with S and M given*
This may be written by obvious arguments (assuming
all keys equally likely)
%(K)- % P(M,K) log X (M,K)
where X (M,K) is the number of different keys from M in para
with K, that is which go to the same E as K.
For simple substitution let P* be the probability
that a received cryptogram of N letters has X different lett
appearing in it. Then
%(K) * £ Px log (26 - x)j
Approximately
log lbgV^26A)
, r
The bracketed terms vary slowly wifcfc atfd it P&) is fairly
well concentrated, we may take the bracket' out" replacing X
by its mean value Xjv This gives,- after recombination
- 72
QM(K) » log (26 -
This residual key equivocation is shown for simple substi-
tution on English in Fig; 12, It measures how much of the
key has not been used in enciphering N letters of text on
the average,
Theorem 19: QjX) - Q(M) ♦ ft^K)
That is, the total key equivocation (when we don't know the
message) is the sum of the message equivocation and the re-
sidual key equivocation; lie;; the equivocation there would
be in the key if we did know the message; This follows from •
the fact that the key uniquely determines the message
properties 4 and 5 in Section X» ■ * .
22. Equivocation for Simple Substitution on an Independent
., tetter Language . • ■
We will now calculate the mean equivocation in key
or message when simple substitution is applied to a two
letter language, probabilities p and q for 0 and 1, with
successive letters independent; We have
% " % " -2PE PJSlK) log PSlK)
The probability that E contains exactly s O's in a particular
permutation is
1 , s nN-s . s N-s,
g- (P q • ♦ 0. P )
and the a posteriori probabilities of the identity and in-
king substitutions are respectively
ver ting
pa q»"» p1^8 q9
hM m 177^ ♦ ,8 p^8) V? * EFT* ♦ >*;
■
There are („) terms for each 8 and hence
73
This may be written
Q(N) = -Z pS q^3 [s log p + (N-s) log qj
, / s N— s s N-s i
- log (pa q p^a)
- -N [p log p * q log q] *■ Z (*) pS q1^8 log (pS qlN"s q£
« MR + iz <N) (pS qN~S * qS p1*"3) log (pS qN-s * qS p1^
For p = 1/3, q = 2/3, and for p * 1/8, q - 7/8, Q, has beer
culated and is shown in Fig. 13,
Now assume the language contains r different
letters chosen independently and with probabilities p, ,
p£****» pr* By approximately the same argument we have
1 2 T> "l
Q(N) - -Z {sx...8T) px p2 ..*pr r log -r±
Sl !
3P. S* _ Pi "»Pr
Sl f
Zp •••PT1
s, ... sr a r\
± T p
where Z s. » N and Z is over all permutations of 1, 8, ...
for a, tw v
Hence, by obvious • transformations
Q(N) m * £ Z Ur5UjJ 2 Pa^.t.P^32, log Z PaSl....
31*" *3r
P ' P
where R - -£ p^^ log p, , . In particular,
QIO) - ± ri log r| - log r: - JkI
3(1) = R ♦ pj- r log <r-l):
*» R + log (r-l')l
This checks the evident answer for 3(1) - the f:
symbol has equivocation R and the parts of the key not us*
add log (r-lJI
23. The Equivocation Characteristic for a "Random" Closec
Cipher > [
-
In the preceding section we have calculated the
equivocation characteristic for a simple substitution appi
to an independent letter language- This is about the simj
type of cipher and the simplest language structure possibl
yet already the formulas are so involved as to be nearly
useless. What are we to do with cases of practical intere
^ . say the involved transformations of a fractional transpose
tion system applied to English with its extremely complex
statistical structure? This complexity- itself suggests tfc
method of approach* Sufficiently complicated problems can
frequently be solved statistically, \ In order to do this y
define the notion of a "random" cipher.. ^
■
We suppose that the possible messages of length
can be divided into two groups, one group of high and fair
uniform probability, while the total probability in the
second group is small. This is usually possible in inform
tion theory if the messages have any reasonable length. I
the total number of messages be
H » 2 0
where R is the maximum rate and N the number of letters-,
high probability group will contain about
RN
3 = 2
where R is the statistical rate.
The deciphering operation defin&s a function M~ i
which can be thought of as a series of lines, k for each E
going back to various M' s. By a random cipher we will mear
one in which all keys are equally likely and the k lines
from any E go back to random M»s.. The equivocation' in key
is given by - - ' 1 "
Q(K) - 2 P(E) PE(K) log PE(K)
The probability of exactly m lines going back
to the high probability group is
- 75 - ^nil HUB P
(k) (s)m n s)k'm
(m) (IT) 11 " I)
If a cryptogram with m lines going to high probability mes-
sages is intercepted, the equivocation is log m. The prob:
ity of intercepting such a cryptogram is easily seen to be
mH
Sic '
Hence the mean equivocation is
■ * ■ & A ui ill* (1-§,k"m ■ l0s »'
We wish to find an approximation^© this for large k. If t
expected value of m, namely m * § k is »1, the variation c
log m over the range where the binomial distribution assume
large values will be small and we oar* replace log nf by log
This then comes out of the summation leaving the expected e
Hence in this condition
Q - log | k
- log S - log H + log k
- Ik! - ImJ + 1m I
- IkI - N D.
If m is small compared to the large k, the binomial distri-
bution can be approximated by a Poisson distribution.*
(k) m k-m e"X Xm \ m S *
lm) ^ H ml a
Hence
Q - £ e S £r m log m
•* 2
■
-X co * m.
- e £ ~r lo€ (»♦!)'
*Fry, Probability and Its Engineering Uses, p. 214,
- 76 -
When we write (m ♦ 1) for m. This may.be used in the regi<
where X is near unity. For X « 1 the only important term
the series is m - 1; omitting the others
-X
<} « e \ log S
» X log 2
- 2lKl Z'm log 2
Thus <i IK) starts off at IkI , and decreases line
with slope -D out to the neighborhood of N»lKl/D. After a
short transition region, Q, follows an exponential witn ha
life" distance l/D if D is in alternatives per letter. If
is in digits per letter l/D is the distance for a decrease
by a factor of 10. The benavior is shown in Fig, 14 with
the approximating, curves.
By a similar argument given in the appendix, the
equivocation of message can be calculated. It is
Q(M) - lid 1 * BQN for B0N« Q(K)*1kI-DN
CUM) - Q,(K) BQN» <4(K)
Q,(M) - %{K\ - 9 (N) B.(N) " Q,(K)
where <p(N) is the function of Fig. 14, with N scale reduce
by a factor of D . Q(M) rises linearly with slope B0 unt
Ro
this line interests the q(K) line. After a rounded transl
it follows Q(K) down.
Most ciphers have an equivocation characteristic
of this general type, approaching zero rather sharply. We
wiU call the number of letters required for near unicity
solution the unicity distance,
24,. Application to Standard Ciphers.
The characteristic derived for the random cipher
may be expected to apply approximately in many cases, pro-
viaine some precautions are taken and certain corrections
are mfde. ThTmain points to be observed are the f ollowin
1. We assumed in deriving the random characteristic
that the possible decipherments of a cryptogram
are a random selection from the possible message
This is not true in- actual oases, but becomes mc
nearly true as the complexity of the operations
used in the enciphering process and the complex!
of the language structure increase. The more cc
' plicated the type pf cipher, the more it should
follow the random characteristic. In the case c
- 77 -
a transposition cipher it is clear that letter
frequencies are preserved. This means that the
possible decipherments are chosen from a more
limited group - not the entire message space -
and the formula should be changed. In place of
R0 one uses Ri the rate for independent letters
but with the regular frequencies. This changes
the redundancy from
D - rq - r * .707 digits/letter
Df " Rjl - R * •538 digits/letter
and the equivocation reduoes more slowly. In
some other cases a definite tendency toward re-
turning the decipherments to high probability
messages can be seen. If there is no clear
tendency of this sort, and the system is fairly
complicated, and the language a- natural one
. (with its very complex statistical structure) -
then it Is reasonable to make the random cipher
assumption.
In many cases the key does not all appear as
soon as It might. For example in simple sub-
stitution one must wait for a long time to find
all letters of the alphabet represented in the
message and thus deduce the complete key. The
message becomes unique long before this point.
Obviously our random assumption falls down in
such a case, since all the different keys which
differ only in the letters not yet appearing
lead back to the same message, and are not ran-
domly distributed. This error is easily cor-
rected by the use of the key appearanoe character
Istio. One uses at a particular N, the amount
of key that may be expected at that point in the
formula for ,
There are certain "end effects*1 due to the defini
starting of the message which produce a discrepar
from the random characteristics. If we take a
random starting point in English text the first
letter (when .we do not observe the preceding
lsttars) hasa possibility of being any letter w:
to
- 78 -
the ordinary letter probabilities. The next
letter is more completely specified since we
then have digram frequencies. This decrease
in choice value continues for some time. The
effect of this on the curve is that the straigh
line part is displaced, and approached by a
curve depending on how much the statistical
structure of the language is spread out over
adjacent letters. As a first approximation
the curve can be corrected by shifting the line
• over to the half redundancy point - i.e., the
number of letters where the language redundancy
is half its final value*
If account is taken of these three effects, rea
sonable estimates of the equivocation characteristic and
unicity point can be made. The calculation can be done
graphically as indicated in Figs. 15 and 16. One draws t.
key appearance characteristic TKl - ^A^-) *&• total r
dundanoy curve ImJ -ImI {which fa usually sufficiently
well represented by the line' NR) ♦ The difference between
these out to the neighborhood of their intersection is
For the simple substitution the characteristic is shown
in Fig. 17. In so far as experimental checks could be ca.
ried out they fit this curve very well. For example, the
unicity point, at about 27 letters, oan be shown experi-
mentally to lie between the limits 22 and 30. With 30 le
one nearly always has a unique solution to a cryptogram o:
this type and with 22 it is usually easy to find a number
them.
With transposition of period d, the unicity poi.
occurs at about 1.5 d log d/c. This also checks fairly w
experimentally* Note that in this case Q, is defined on.
for integral multiples of d. '
With the Vigenere the unicity point will occur t
about 2d + 2 letters, and this too is about right. The
Vigenere characteristic with the same key size as simple i
stitution will be approximately as shown in Fig. 3.8, The
Vigenere, £layf air and Fractibnal cases are more likely tc
follow the theoretical formulas for random ciphers than
simple substitution and transposition,. The reason for th:
is that they are more complex and give better .mixing char-
acteristics to the messages on which they operate*
■-- ■ ' i '
The mixed alphabet Vigenere (each of d alphabet
mixed independently and used sequentially) has a key size.
'4i- ..
1 .
2
3
4
5
1.25
1.00
.60
.34
0
1.25
.98
.54
,15,
.03
- 79 -
IkI - d log 26V- 26.3 d
and its unicity point should be at about 53 d ♦ 2 letters
These conclusions can also be put to a rough ex
perimental test with the Caesar type cipher. In the part
cular cryptogram analyzed in Table I, section 19, the fun
tion QlN) has been calculated and is given below, togethe
•with the values for a random cipher.
N . 0 ♦
Q {observed) 1.41
Q (calculated) 1.41
The agreement is seen to be quite good, especia
when we remember that the observed 9, should actually be t
average of many different cryptograms, and that D for the
larger values of ,M is only roughly estimated. *
It appears then that the random cipher analysis
can be used to estimate equivocation characteristics and
the unicity distance for the ordinary types of ciphers.
25. Solving Systems Using Only N-Gram Structure. ,
The preceding analysis can also be applied to c
where the cryptanalyst is assumed to know or use only a
limited knowledge of the structure of the language. If n
data about the language other than the digram frequencies
is used in solving cryptograms the equivocation curves ma:
be computed, using for the redundancy curve that obtained
from D„ alone. This curve lies below the curve for all r<
dundancy and the unicity point will therefore be moved to
a larger N. Fig, 19 shows the Q curves for simple substi-
tution on normal English when the cryptanalyst uses only
digram structures.-
26 * . Validity of a Cryptogram Solution.
■ * •
The equivocation formulas are relevant to quest:
which sometimes arise in cryptographio work regarding the
validity of an alleged solution to a cryptogram.. In the
history of cryptography one finds many cryptograms, or
possible cryptograms/ where clever analysts have found a
^solution*!* It involved,* however, sucty a complex process
the material was 'so scanty, that the question arose as to
- 80
whether the cryptanalyst had "read a solution" into the
cryptogram. See for example the Bacon-Shakespeare ciphers
and the "Roger Bacon" manuscript.*
In general we may say that if a proposed system
and key solves a system for a length of material considers
greater than the unicity distance the solution is trust-
worthy. If the material is of the same order or shorter
; _ than the unicity distance the solution is highly suspicioi
Thifleffeot of redundancy in gradually producing
unique solution to a cipher can be thought of in another \
which is helpful. The redundancy is essentially a series
conditions on the letters of the message, which insure tte
it be statistically reasonable. These consistency conditi
produce corresponding consistency conditions in the crypto
gram. The key gives a certain amount of freedom to the
cryptogram, but as more and more letters are intercepted,
the consistency conditions use up the freedom allowed by t
key. Eventually there is only one message and key which
satisfy all the conditions and we have a unique solution.
In the random cipher the consistency conditions are in a
sense "orthogonal" to the "grain of the key", and have the
full effect in eliminating messages and keys as rapidly at
possible. This is the usual case. However, by proper de-
sign it is possible to "line up" the redundancy of the
language with the "grain of the key" in such a way that tt.
consistency conditions are automatically satisfied and Q,
does not approach zero. These "ideal" systems are of such
a nature that the transformations T. all induce the same
probabilities in the E space. Ideal characteristics are
shown in Fig. 20.
27. Ideal Secrecy Systems.
We have seen that *perf ect secrecy requires an
infinite amount of key* With a finite key size, the equiv
cation of key and message generally approach zero, but not
necessarily so* In fact It is possible for Q(K) to remain
constant at its Initial, value IX). Then, ho matter how
much material . is intercepted, there is not a unique soluti
but many of comparable, probability. We will define an
"ideal" system as one in which (UK) and Q(M) do not approa
zero as-* oo, A "strongly ideal" system is one in which
Q(K) .remains constant at IKU
*See Fletcher Pratt, "Secret and Urgent"
m 81 - CO]
r ."V 5,-
I
.1 1 *
V
An example is a simple substitution on an artifi
language in which all letter probabilities are the same and
each letter independently chosen. It is clear that Q(K) »
and Q(M) rises linearly along a line of slope Rq until it
strikes the line Q(K), after which it remains constant at
this value.
With natural languages it is in general possible
to approximate the ideal characteristic - the unicity point
can be made to occur for as large N as is desired. The
complexity of the system needed usually goes up rapidly as
we attempt to do this, however*. It is not always possible
to actually attain the ideal characteristic with any. system
of finite complexity*.
To approximate the ideal equivocation, one may
first operate on the message with a transducer which reduce:
to the normal form « i.e., with all redundancies removed.
After this almost any simple ciphering system - substitutio:
transposition, Vigenere etc*, id satisfactory* The more
elaborate the transducer and the nearer the output is to
normal form, the more closely will the secrecy system ap-
proximate the ideal characteristic. Theorem 20: A necessa:
and sufficient condition that T be strongly ideal is that
for any two keys TT -1T - is a moasure preserving trans-
1 J
formation of fi^ into itself* '
This is true since the a posteriori probability
of each key is equal to its a priori probability if and onl;
if this condition is satisfied,
28* Examples of Ideal Socrecy Systems.
Suppose our language consists of n sequence of
letters all chosen independently and with oqual probability
Then the redundancy is zero, |M:ol ■ |M"j , and from Theorem 11
Q(K) - |K|. We obtain the result
Theorem 21? If all letters aro equally likely and independc
any closed oipher is strongly ideal*
The equivocation of message will rise along the
key appearance characteristic |K| - which will usuall:
approach |k|, although in some casos it does' not*. In the
cases of N-gram substitution,, transposition', Vigenere and
variations, fractional, otc, wo havo strongly ideal system;
for this simple language with Q(M) — |K| as oo..
- 82 -
If the letters are independent but are not all
equally probable, the transposition cipher characteristics
remain essentially the same. The asymptotic equivocations
of both key and message are clearly IKl. In the substitution
cipher they will be less. If all the letter probabilities are
different, then the asymptotic equivocations of both key and
message are zero. The letters can all eventually be de-
termined by frequency count (apart from certain exceptional
sequences of zero measure)* Suppose now that there are ?
letters with probabilities, ' , .
... . ,
PX - P2 < P3 < P4 - P5 - P6 < P9
In this case we cannot separate p, from pg or p4 p= and pfi
from each other, but the different unequal probability groups
can be eventually separated.
If all substitutions are a priori equally likely,
there will be an asymptotic uncertainty among
■ ■•
2i x 3I
equally likely (a posteriori) keys. Hence, the symptotic Q,
be
■ log 21 3:
In general it is clear that the asymptotic equivocation with
a substitution where the different substitutions are equally
likely is
$m (M) ■ (K) - log H
vhere H Is the order of the group of substitutions on the
letter probabilities p^ ... pfl which leave this set invariant.
More generally we can consider an arbitrary pure
sy stem T and a pure language L, . Suppose that T operates >
only "locally" on the letters of U in the sense that the nth
letter of cryptogram depends only on n and a certain finite
number of the letters of M in the neighborhood of the nth
one: ■ ■ - ' itU- -"*»-"
ea - f lK.njm^ m^,. . t.m^p)'.
i
Then we can show that there is a certain subgroup of the t
formations T^-1T which are probability preserving in the
language L. In the limiting cases these would consist of
the identity or of the whole group ™ -1™
Ti V
Theorem B2: Under these conditions the asymptotic equivoc
of key is the logarithm of the order of this subgroup of
. measure preserving transformations.
An ideal secTecy system suffers from a number 01
disadvantages.
- i '■ '.. " '*. . ** \ ..
*•• 1* The system must be' closely matched to the langue
This requires an extensive study of the structur
of the language by the designer. Also a change
statistical structure or a selection from the se
of possible messages as in the case of probable
words (words expected in this particular cryptog
renders the system vulnerable to analysis.
2. The structure of natural languages is extremely
complicated, and this reflects in a complexity c
the transformations required to reduce them to
the normal form. Tbus any machine to perform th
operation must necessarily be quite involved, at
least in the direction of information storage,
since a "dictionary" of magnitude greater than
• that of an ordinary dictionary is to be expected
3. In general, reduction of a natural language to a
normal "form introduces a bad propagation of erro.
characteristic. Error in transmission of a sing
letter produces a region of changes near it of
size comparable to the length of statistical
effects in the original language,.
£9* Multiple Substitute Ideal Systems.
. * There is another way of obtaining ideal or nearl;
,, ideal characteristics using multi-valued secrecy systems.
Suppose our language contains only three letters with -
probabilities 1/8, 3/8 and 4/8, and that successive letter:
84 -
■
in a message are chosen independently. Let there be 1 sub-
stitute for the first letter, 3 for the second and 4 for
the third, and choose at random among the possible substi-
tutes for a letter. It is clear that this system is ideal,
If the different probabilities are incommeasurabl'e, we canr
exactly achieve the ideal behavior, but can approximate it,
by using enough substitutes, as closely as desired*
If the language is more complex, with transition
probabilities, this general method can still be used, but i
becomes more involved* Suppose the choice of a letter de-
pends only on the two preceding letters, not on any more
remote part of the message. The transition probabilities
p, (k) completely desoribe the statistical structure of the
language. We supply substitutes for k When it follows i, J
proportion to p^ 1*1* Of all our m substitutes mp^tk)
represent k after the pair irJ, As before one chooses from
the possible substitutes for a letter at random. The crypt
gram will then be a random sequenoe of the m substitute
letters
As an example, suppose the p^j) are the only
statistics of the language and the values are given by
iNJ 12 3
1
2
.1 .3 ,6
1 2 .5 ,3
,9 .1 0
With 10 substitutes 0, 1, 2, ,,,,9 we construct a substitu
table assigning substitutes (chosen randomly) in proportion
to the frequencies* The following is a typical key.
i
1
I
3
L
2
7 0,5#6 1,2,3,4,8,9
3,9 0,4,8
j .\ • » • *
0,1,2,3,5,6,7,8,9 4
If a 3 follows a E in the message we substitute one of 0,
for it, the choice being random. A second table must be s<
plied for the first letter of the message, corresponding t
unconditional probabilities of the three letters, •
Although of theoretical interest it is doubtful
whether such systems would be of much use practically beca-
. of their complexity and message expansion in ordinary case
However j, the first approximation to such systems, matching
letter frequencies, has b$en used in ciphers and is standa;
practice in codes (where one matches word frequencies).
30 . Equivocation Rate."
■ ■ .< We now return briefly to cases where the key is
not finite, but is supplied constantly, as in the Vernam s-
and the running key cipher In such cases we may define
equivocation "rates'*. One ©onsldere the equivocation Q(N)
of the message when N letters have been intercepted, The
equivocation rate for the message Is defined as the limit
(assuming it exists):
Lim" Q(N)
N-oo ~ Q •
The rate for equivocation of key would be defined similarl;
using the equivocation in the part of the key that has beei
used only, but of course these two are the same. There art
results for these parameters analagous to those obtained
with finite key cases. Let R» be the mean rate of using
key,
■
Theorem 23:
... * '■•
Q* < R»
In case the equality holds we have the analogue of ideal
systems where the complete information of the key goes intc
equivocation. If R* > IB the rate of the-message source,
we can obtain perfect secreoy - In fact we may define per-
fect secrecy as the case in which Q* * H« ,
In the random pase we have the analogous result
V - R» - D, •
31, Further Remarks on^ Equivocation and^ Redundancy.
We have taken the redundancy of "normal English"
to be about ,7 digits per letter of 50^ of RQ. This is on
the assumption that word divisions were omitted. It is at
approximate figure based on statistical structure of the
order of lengths of perhaps 8 letters, and assumes the te?.
to be of an ordinary type, such as newspaper writing,
literary work, etc. Various methods of calculating re<-
dundancy have been devised and will be described in the
memorandum on information mentioned in the intro-
duction. We may note here two methods of roughly estimati
this number which are of cryptographic interest.
A running key cipher is a Vernam type system whe
in place of a random sequence of letters the key is a
meaningful text. Now it is known that running key ciphers
can usually be solved uniquely. .This shows that English
can be reduced by a factor of two to one and implies a
redundancy of at least oOjfa. This figure cannot , be reduced
very much, however, for a number of reasons, unless long
range "meaning" structure of English .is considered* , .
The running key cipher can be easily improved to
lead to ciphering systems which could not be solved withou
the key.. If one uses in place of one English text, about
4 different texts as key, adding them all to the message,
a sufficient amount of key has been introduced to produce
a high positive equivocation rate. Another method would
be to use say every 10th letter of the text as key. The
intermediate letters are omitted and cannot be used at any
other point of the message, This has the same effect, sine
the mean rate for these spaced letters must be over .8 Ho.
These methods might be useful for spies or diplor
. who could use books or magazines for the key source.
A second way of showing the high redundancy of
English is to delete all vowels from a passage. In. general
it is possible to fill them in again uniquely and .recover
the original, without knowing it in advance. ■ As the vowels
constitute about 40j£ of the text this jmta a limit on the
redundancy. ' Aotually there is considerable redundancy left
the various letter and digram frequencies being far tram
uniform, c '■• . ■ v v,f - ~--:xm-.
■ - - . \ ■ ■•. -v • • "• •
- - This suggests a simple,, way of greatly improving
almost any simple ciphering: system * - Jirst delete all vowel
or as much of the message ss possible without running the
risk of multiple solutions, -and than encipher the residue.
Since this reduces the redundancy by a factor of perhaps
3 or 4 to 1, the unicity~ point will be moved out by this
■
- 87 - CONK
factor. This is one way of approaching ideal systems -
using the decipherer's knowledge of English as part of the
deciphering system, **** w WA 6Iie
Two extremes of redundancy in English prose are
represented by Basic English and Joyce's "Einnegans Wake",
The basic English vocabulary consists of only 850 words
and a rough estimate puts the redundancy at about 70*.
A cipher applied to this sort of text would rapidly approa
unicity. Joyce, on the other hand, would be relatively ea
ifJSfi*??^??* 'fl?aI1 red^ancy is disclosed by the dif-
ficulty in filling incorrectly even a single missing lett,
pom "Jinnegan8: Wake" f What the numerical value is, would
be difficult to determine > it varies widely throughout the
COOK,
■ - : * . '"'<-./*
The mathematical extremes of redundancy, 0 and 1C
can be constructed in artificial languages. .In the first
we have e.g.. a single possible message. 0 iden-
tically and QIK) ih, the random cipher case declines as
rapidly as possible i.e.., as rapidly as ohe sends informa-
tion on the system,, v In .the other extreme all letter sequer
are equally likely, and any closed ciphering system is idee
We may refer here to a memorandum by Nyquist
(Enciphering-Effect of Redundancy in "Language, May 30, 1944
in which some questions of the type we are considering here
are discussed. i*—
32. Distribution of Equivocation.
A more complete description of a secrecy system
applied to a language than is afforded by the equivocation
characteristics can be found by giving the distribution
of equivocation. For N intercepted letters we consider
the fraction of cryptograms for which Q (for these particu-
lar E's, not the mean OJ lies between certain limits. This
gives a density distribution function •
. P(Q,Nh d^
f01, ^^Probability that, for N letters Q lies between the
limits Q and Q + dft, . The mean equivocation we have previous
studied is the mean -of ^this distribution. .;
Q.dCi.
The function P(Q,N), can- be thought of as plottedalong a
third dimension, normal .to the paper, on the Q^N plane. If
the language is pure, with a small influence « range (com-
pared to K) and the cipher is pure the function P(Q,N) will
88 - *P0!ff'lU.iJfIAL
usually be a ridge in this plane whose highest point follows
approximately the mean at least until near the unicLty
point. • In this case, or when the conditions are nearly
verified, the mean Q curve gives a reasonably complete pictv
of the system, •
On the other hand, if the language is not pure,
but made up of a set of pure components..
L • Z %\ ,
■ ' ' ■ '•
having different equivocation curves with the system, say
Qi. Qj>, .... Q then the total Q distribution will usually be
made up of a series of Ridges* 1 There will be one for each 1
weighted in accordance with its p*y The mean, equivocation
characteristic will be a line somSwhere in the midst of thes
ridges and may not give a- very complete picture of the sit-
uation. This is shown in Pig* '21 # ,« , ' ~
A similar effect occurs if the, system is not pure
but made up of several systems with different ft curves.
There is then a series of ridges in the PU,N) plot, and
the mean Q, strikes an average which ,may lie between ridges
and be a very improbable value of Q, for a particular crypto-
gram. These effects are illustrated in Fig. -22.
The effect of mixing pure languages which are
near to one another in statistical structure is to increase
the width of the ridge. Near the unicity point this tends
to raise the mean equivocation, since equivocation cannot
become negative and the spreading is chiefly in the positive
direction. We expect therefore, that in this region the
calculations based on the random cipher should be somewhat
low.
I
- 89 -
PART III
, Practical Secrecy
33. The v.Tork Characteristic
After the unicity point has been passed there wil
usually be a unique solution to the cryptogram. The proble
of isolating this single solution of high probability is th-
problem of cryptanalysis .. In the region before the unicity
point we mav say that the problem of cryptanalysis is that
isolating all the possible solutions of high probability (c
pared to the remainder) and determining their various probe
ities. . . i ... / ** -.'* " - . ...
>.; :;'7V-- - .
Although it is always possible in. principle, to de-
f. • mine these solutions <ty trial of each ^possible key for e'xa;
different enciphering systems show a wide variation in the s
of work required. The average amount of work to determine
key for a cryptogram of N letters- T"(N) measured say in man .
may be called the work characteristic of the system. This
averag. is taken over all messages and all keys with their ;
propriate probabilities.
; , For a simple substitution on Snglish the work and
equivocation characteristics would be somewhat as shown in
Fig.. 23.- The dotted portion of the curve is where there ar
numerous possible solutions and these must all be determine
In the solid portion .after the unicity point only one solut.
exists in general, but if only the minimum necessary data e
given a gr^at deal of work must be done to isolate it. As
more material is used thj work rapidly decreases toward som
asymptotic value - where the additional data no longer redu-,
the labor. ,
I , This is the work characteristic for the key. It :
* \ '. clear that after the unicity point this function can never :
• *■ 1 creese. There is also a work characteristic: fdr the messag
the average emount of work to determine th;e;raessago (or all
' reasonable messages) . . This will i, ih ordinary cases , be bel
or et any rate not far above the work characteristic for th
key, out to fairly large W. since generally If 'the key is d
termined it is easy to find IS by the deciphering transformer
For very largo N, howevdr, this function will incroa-se due
merely to the lebor of deciphering the large amount of inte:
cepted material. . -
- 90
Essentially the behavior s^ ^>*^Mo,
exnected with any type of seer -c y quired, however
c.pproaches zero. The seal ^ofv men nou *^ g> _ven ^
will differ greatly with diffor*nt ^yp Qr cocipound
th. Q curves are about *gw. ^ k5y si2i3 would have a muc
Vigenere, for example, with th. Sect/ristic. * good practic:
better (U./nuoh ^f^fttf"(H)curve remains sufficie:
secrecy system is one l4t.rs one expects to transmit
ly high out to the number of ™ uctSaiiy carrying out
with the key, -to g^tv^t tStuch an extent that the inform:
the solution,' or to delay it to su i
tion is obsolete. * • • .
-V ^•^wiUxan,ider>n the following ^^Sb/^C?L-
. keeping the* Unction fW^o, - ^^^type of "problem as
» cllv zero, * This is essential/ - hfttle of wits.*. ' In design-
■ is always'the .case when we ^^g^ amount of work
ing a goodr cipher we must m ™ unougn merely to
thf ene**rnust do ^ t^;k it.^ ^ **f twullysis work -
be sure none 01 tho St. nd.ra iU break the system
we must show thct no method ^tev.r f Q$ m ny systems
< easily. This U 5l!tb3i SS known methods of solutio:
they were designed to resist ai w fl;3tnod which applied to
but had r structure leading to n;*> nr™ hfcVd b3on many
disclosed werknjssos of th„ir own.
- -v flasiKii is essentially on
in a field . • .
v.- e„r« that a system which is not
vife3* 1 -„-,- -"*""*." »tTh »nrv of Games"., The s:
te^'^^^ Neumann ^^^^^Sr cnl crjptanalyst can be th
,.tlori between the ciPner-/t?nfi atructure; a zero-sum two p
• - ' : ^ 'lt ss^gome" of » very feLT 'Lt ^ "novas*. The <
^ game wi%. comp^^^ Information,^ ana jv. cryptan:
I %. Cign#chooses a system for ^^^^-^^od-of analysis
is informed of. this choic. and cno ~ rjquired to bre
. - The "value" of the P^.J ^ "nathod cll0Sjn...'
r. cryptogram in the system cy
•(1) *fe can study the possible methods of solution available
to tha cryptanalyst and attempt to describe them in suffici^-n'
gen:.rc.l t^rns to cover iny methods h^ might use. fc'j th^n con-
struct our system to resist this "general" method of solution.
(2) \U may construct our ciphers in such a way that breaking i
is equivalent to (or requires at some point in the process) tl
solution of some problem known to be Laborious. Thus, if we
could show thf.t solving t system requires at least as much wor
as solving a system of simultaneous equations in a largo numb^
of unknown, of a complex type, then we will have e lower bounc
of sorts for the work characteristic. ' .
"i-- r ■ •"' . •„•> '
The next three sections ore aimed at these general
problems. It is difficult to define the pertinent ideas in-
volved with sufficient precision to obtain results in the forrr.
of mathematical theorems/ but it is believed that the conclusi
in the form of general principles, are correct.
34 . - Generalities on the Solution of Cryptograms .
After the unicity distance has been exceeded in intc
cepted materiel, any system can be solved in principle by mor_-
trying each possible key until the unique solution is obtained
i.e., a deciphered message which "makes sense" in ~l*-r. A simpl
calculation shows that this method of solution (which we may c
complete trial nnd error) is totally impractical except when t
key is absurdly smalTT
Suppose, for example, we ht-vo a key of 261 possibili
or about 26.3 digits, the samu size as in simple substitution
English. This is, by any significant measure, a small key. I
can be written on a sm?:ll slip of paper, or memorized in a few-
minutes. It could be registered on 27 switches each having to;
positions or on 68 two position switches'.
Suppose further, to give the cryptanalystl every poss-
ible* advantage, thtt he constructs a electronic device to try
keys &t the rate, of one each microsecond ( perhaps ^eutomati call'
selecting from the~rosults by a X2 test for statistical signi-'
fionnce). He nr:y expect to reach the right key about half way
through, and after nn elapsed time of about ->>
2 x 60c x 24 X 365 x 10
26~ • ' ' ' ->'
— - r - 3 x X0X* years
<P w Ami. « TfiK ~ mo '/
ft
In other words, even with a smtll key compl-te trial
and error will nev^r be used in solving cryptograms, except in
the trivial case where the key is extremely small, e.g., the
caeser with only 26 possibilities, or 1.4 digits. The tri
snd error which is used so commonly in cryptograph"; is of
different sort, or is augmented by other means. If one he.
secrecy system which required complete trial and error it
be extremely safe.- Such a system would result, it appears
the original messages, all say of .1000 letters, weru a ran
selection of 2 RN from the set of all 2 RoN sequences of 1
letters. If any of the simple ciphers w«rc applied to the
it seems that little improvement over complete trial and «.
would by possible.
The methods actually- used often involve a great
-x.pt trirl and error, but in a different way- First, the tr
;,.;V ' _ ' progress from more probable to less probable hypotheses, a.
* second,, each trial disposes of a large group of keys,. not
% ■ . single one. Thus the key space may be 'divided into say 10
subsets, each containing about the srjne number of keys. B.
. at most 10 trials on= determines which subset is the corrtsc
one. This subset is then divided into several secondary s
sets end the process repeated.. Y/lth the same key size
(K • 261 - 2 x 102°) we would expect about 26 x 5 or 130 t:
as compared to 1026 by complete trial and error. The poss:
bility of choosing the most likely of th~ subsets first fo
test would improve this result evefi more. If the division:
were into two compartments (the b^st way) only 90 trials w.
be required. Wiore; s compljt^ trie! and error requires tr:
to the order of the number of k-ys, this subdividing trial
and error requires only trials to th~ order of the key siz
in r.lternetives.
This remains true even when the different keys h
different probabilities. The proper procedure then to min.
the expected number of trials is to divide the key space ix
subsets of equiprobr bility , Yftien the proper subset is det.
t.. , " . mined, this is again subdivided into equi probability subset
;. : If this process can bo continued the number of trials expec
when each division is into two subsets will be
* *- • .
r-v-.-" h- ki • - •• y'
- ■-» • *v. ... _ . log 2 . ,■ .
? yr' *- -r*v . v jf jfcch test has S possible results and each of t
fc v; corresponds to the key being in one of S equiprobabilitf ~su
rr^-. .then ., ,. .... lT^T.?^f
t&ft- ."■ • 1 |Vi ■ ... . '
Vyr,. - • * • • • n - ILL ■ : • 7 ,; v.. -
C- \;. ' - . ' log S
/
trials will bo expected. The intuitive aifnif icunco of thes^
results should be noted. In %h4 two compartment tuSt with
jquiprobibility, each test yields one altornr.tiVw of informa-
tion to the key. If the subsets hcv^ very different prob-
abilities as in testing t. single key in complete trial and er
only i snail amount of information is obtained froa th~ test.
This with 26: equiproble keys, a tost of on„ vields only
■
261-1 lnrr 26t -1 . 1 . m 1
-25
or about 10 alternatives of information. Dividing into S
equiprobability subsets m^ximiz^s the information obtained fr
each trial at log S, and the expected nuriber of trials is the
total information to be obtained, that is th~ key size, divid
by this amount ,
The question here is similar to various coin weigh-
ing problems th; t he Vo been circulated recently. A typical
example is the following: It is known that one coin in 27 is
counterfeit, and slightly lighter than the rest. A chemists
balance is available r,nd the counterfeit coin is to be isolat
by a series of weighings, '"hi t is thu lee st number of weigh-
ings to do this? The correct answer is 3, obtained by first
dividing the coins into three groups of 9 uach.. Two Of th-.se
are compered on the b: Irnce. The three possible rjsults de-
termine the set of 9 containing the counterfeit.. This s^t is
then divided into 5 subsets of 3 er.ch and the process continu
The set of coins corresponds to th^ set of keys, the counturf
coin to the correct key, and the weighing procedure to & trial
or test.
>.
This method of solution is feasible only if the key
space can be divided into e small number of subsets, with s
simple method of determining to which subset the correct key
belongs.. Started in another way. It is possible to solve for
the key bit by bit.. One does not need to assume a complete kt
in order to apply a consistency test and determine if the as-
sumption is justified - an assumption on a "part of the key
(or as to whether the key is in some large section of the key
space) can bo tested.
This is one of the greatest weaknesses of most ciph
ing systems. For example, in simple substitution, an assumpt.
on e single letter can be checked against its frequency, vari
of contact, doubles or reversals, etc.. In determining a sing-
letter the key space is reduced by 1.4 digits from th. origin
26. The same effect is seen in all th~ elementary typos of
ciDhers. In the VigenJr^, th- assumption of tvvo or thre^
letters of the key is easily chock-d by deciphering at other
points with this fragment and seeing whether clear emerges*
The compound Vigene'ro is much butter from this point of view,
if we assume a fairly large number of component periods, pro-
ducing a repetition rate larger than will be intercepted.
Her-j as many key letters ere used in enciphering each letter
as there ere periods - although this is only a fraction of the
entire keyi at JLeast e fair number of letters must be assumed
before a consistency, check can be applied*
. v ••. *•>
Our first conclusion then, regarding practical small
key cipher design, is that a considerable amount of key should
be used' in enciphering each small element of the message.
35. Statistical Uethods
' i - ,. It is possible to solve many kinds of ciphers by
statistical analysis. Consider again simple substitution.
Tha first thing a cryptographer do^s with an intercepted
cryptogram is to make a frequency count. If the cryptogram
contains say 200 letters it is safe to assume that few, if
any, letters are out of their frequency groups, this being
a division into 4 sets of well defined frequency limits. The
log of the number of keys within this limitation may be
calculated as
log 21 91 .9! 61 «= 14.28
and the simple frequency count thus reduces the key uncertainty
by 12 digits, a tremendous gain.
■
In general, e statistical attack proceeds as follows.
A certain statistic is measured on the intercepted cryptogram
2. This statistic is such that for all r easonable K it assumes
about the sane value, Sr, the value depending only on the par-
ti culnr" key 25^ that wrs used. The value thus obtained serves
to limit the possible keys» to those which would give values
of S in the neighborhood of that observed. .A statistic whicb ,
does not depend on K or which varies as much with Mas with K
is not' of velue in limiting" K» Thus in transposition ciphers ,
the frequency, count of letters gives no information about K -
every K loaves tB^s* statistic the sane. Hence one can make
no use of a frequency count in breaking transposition ciphers.
Ilore precisely one can ascribe a "solving power " to
c given statistic S» For er.ch valuu of S there will be a
conditional equivocation of the key Qg(K), the equivocation
when S has its particular value and that is all that is kn
concerning the key. The weighted mean of these values
£P(S) Qs(K)
•
gives the mean equivocation of the key y hen S is known, F
being the: c priori probability of the pcrticular value S.
key size IK I less this aean equivocation measures the "sol-
power" of S,
; >vpr In a strongly ideal cipher all statistics of the
togram are independent of the particular key used. This i:
the. measure preserving property -of TiTiZ-Von the a space o
Tj-lTk on the space mentioned abovS. -~ •
There are good and poor, statist ic's, just as ther
good and poor nethods of trial and. error. Indeed the tri:.;
error testing of hypothesis Jj a type of statistic, i-nd wh.
yiB said above regarding the .best types of trials holds ge:
- "A good statistic for solving a system must have th~ follow"
properties:
1. It -must bo simple to measure.
2. It nust depend more on the key then on the nesse t
if it is meant to solve for the key. The veriati c
with K should not mask its vrriation with K.
3. The values of the statistic that can be "resolved'
in spite of. the "fuzziness" produced by variation
in II should divide the key space into a number of
subsets of comparable probability, with the static
tic specifying the one in which the correct key
lies. The statistic should give us sizable infor-
. nation about the key,, not a tiny fraction of an
- alternative. . • ' - -"
-4* ...The infonaation.it gives nust be simple and usable
." • . - : Thus the subsets In which t bo statistic locates th
v^key rxust be of .*L simple nature in ths^key spuce.
:'- *>r< _ ' :iv '.. *' n^-ifHfcv'' . -irfA .
, Frequency count for simple substitution is an
: ,«$$opi£ uof 't. very good statistics* _ ' ^ ^Vv^:-.
. » .. _ ,^t. ... . .. . -
Two methods (other tban >rocouris^'o:^i%enl' systems
suggest themselves for frustrating a statistic^ analysis.
These we mcy cf 11 the methods of diffusion and confusion,
the method of diffusion th^ statistical structure of R whic:
leads to its redund: ncy is "dissip; ted" into long range st:
- i.e., into statistic;! structure involving long coabinati
- 96 -
?Tfide:;-
- of letters in the cryptogram. The effect here is that the
must intercept a tremendous amount of material to tie down
sturcture, since the structure is evident only in blocks o:
small individual probability. Furthermore even when he har
ficient material, the analytical work required is much gre?
since the redundancy has been diffused over a large number
individual statistics. An example of diffusion of statisti
is operating on a message m - mi, m2, m3 ..... with a "smoc
ing" operation, e^g, >v ,
s
' vn "s mn+i mod 26 , ■ - -
. - -V - • i-1 ' •-r ^K,-/V
- , , * " f . w HurlfCf. ■*■•■ ••• • " "' • - * ■ 1
adding s successive letters of the message to get a letter
^One can show that the redundancy of the y sequence is the s
as that of the m sequence, but the structure has been dissi
Thus the letter frequencies in y will be more nearly equal
« in m, the diagram frequencies also mor3 nQapiyfaqual etc,
... - deed any reversible operation which produces -one letter out
each letter in and does not have an infinite "memory" has a.
output with the sams redundancy as the input. The statisti
can never be eliminated without comwession, but they can t
spread out* •
..r .' The method of confusion is to make the relation t
the simple statistics of 3 and the simple description of K
complex and involvid one. In the case of simple substituti
was easy to describe the limitation of K imposed by the let
frequencies of 3. If the connection is very involved and c
fused the enemy can still evaluute a statistic Si say which
the key to a region of the key space. This limitation, how
is to some complex region R in the soace - folded over many
and he has a difficult time mr.king use of it, A second stc
S2 limits K still further to Rg, hence it lies in the inter,
region R1R2* but this does not help much because it is so d;
cult to determine just what 'the intersection is." .
i , 'v-v To be more precise lot us .suppose the It ey space he
oertcin "natural coordinates* kl,k2, " . k- which he .wishes
terminey. .He measure's c set of -'stati sties sijSg^^^s' anc
ere sufficients to determine the k^. However, in the method
confusion, th* equations connecting thes a sets of variables
involved and complex. We have, : s^y, -: '•^•;':'r'a~-~
fn(k1,k2,,.;,ki>).- sn,
- 97 -
NTIA1
and all the f. Involve all the k^. The cryptographer must
solve this system simultaneously - a difficult job. In the
simple "(not confused) cases the functions involve only a
small number of the k. - or at least some of these do* One
first solves the simpler equations, evaluating some of the
ki and substitutes these in the more complicated equations.
The conclusion here is that for a good ciphering
system steps should be taken either to diffuse or confuse
the redundancy (or both)- / / .
V '> ■ " ■ - "AV. .
36, The Probable Word Method, . - ' _ , . .
One of -the most powerful tools for- breaking ciphers
is the . use of prQbable words,. The probable words may-^.-J^.y
words or phrases expected in the particular message flue, tq j";
its source, or they may merely be common words or syllables
which occur in any text in the language, such r.s the; end,
tion, thrt, etc.." v i
In genera 1> the probable word method is^used as
follows* Assuming a probable word to be at some point in
the cleT, the key or r part of the key is determined* This
is used to decipher other pp. rts of the cryptogram and provide
r consistency test* If the other prr£s come out in clerr,
the resumption is justified.
There pre few of the classical type ciphers that
use a sm^ll key and can resist long under a probable word
analysis. Fr^m a considerr tion of this method v.e can frame
a test of ciphers v.hich might be called the r e id test. It
applies only to ciphers with a small key (less thr.n say 50
digits), applied to natural languages, and not using the
ideal method of gaining secrecy. The rCid test is this:
Hoy. difficult is it to determine the key or a p^rt of the
key knowing n sample of message rnd corresponding cryptogram?
Any system in v.hich this is easy cannot be very resistant,
for the cryptr.nrlyst can always make use of probable words,-
combined with trial and error, Until a consistent solution
is obtained-
- - . ' v •' .'• ' ■ ■ . : " ri -
The conditions. r>n the, size of, the k:y make the
amount of trial end error small, and .the' -condition about"
ideal systems is necessary, since these automatically give
consistency checks- The exist enoe~ of . probable words and v."*;-.-.
phrrses is implied, by the condition .of natural language a* . *
Conversely, it seems reasonable that if the key is difficult* ? '
to obtain, knowing a text :ahd Its cryptogram, then the
system should be strong. • .*"■■' '
- 98 - COlMflENTIAL
Note that this requirement by itself is not con-
tradictory to the requirements that enciphering and decipher-
ing be simple processes. Using functional notation we have
for enciphering
and for deciphering
E = f (K, I)
M - g (K, E).
Both of these may be simple operations on their arguments
without the third equation
. - K » h (M, E) • - - ■ - '
• . jg -. ■ ' , . .-
being simple* \. ^ v''"" ;-
^ • - . .3 ' :" :: ''5v
V'e may also point out In investigating a new type
of ciphering system one of the best methods^off attack is to
consider hove the key could' be determined' if a sufficient
mount of'M and E were given. -
With a small key, the work required to solve a
system, given a lerge emount of dr.ta, may be expected to be
not more thrn a few orders of magnitude greater thpn the
work required to obtain the key from a small amount of datr
when both U end E nrc known.
The same principle of confusion era be (nnd must be
used here to crer-te difficulties for the cryptanrlyst.
Given K-rn^mg ... mg end E - e, eg eQ the crypt rn^lyst
enn set up equations for the different key elements k^ kg
(nrmely the encipherings equations)* V; "
fg (n^, m2# •♦♦,m8J l£i#».*#kr>^
- 99 - ' mm lUiLUTiius — -
All is known, we assume, except the k,. Erch of thr s j equa-
tions should therefore be complex in the k., and involve
ninny of then. Otherwise the enemy en solve tho sicple om
and then the more complex ones by substitution.
From the point of view of increasing confusion, it
is desirr-ble to hive the- f^ involve several n^.t especially
if these sre not adjacent and hence less correlated. This
introduces the undesirable feature of error propagation.,
however, for then erch e, will generPlly affect several m,
in deciphering, and an error will spread to rll these..
We conclude thet much of the key should be used Ir.
an involved manner in obtaining any cryptogram letter from
the message to keep the work characteristic high* Further r
dependence on several uncorrected m. 4-s desirable,, if some
propagation of error can be , tolerated* V/e are led by all
three of the rrguments of these sections to consider "mixing
transformations,." ,
37* Mixing Trensf ormo tions
A notion that hr-s proven v^lu^ble in certain branc
of probability theory is the concept of a "mixing transforms
tion." Suppose we have a probability or measure space 0, ar.
measure preserving transformation T of the space into itself
i.e., a transformation such that the measure of a transform*
region TR is equal to the measure of the„initial region R.
The transformation is called mixing if for any function de-
fined over the space , end any region R.
n^o, J 'til) dP - J dP J f (P) dP.
T°R R O '
This means that any initial region of the space R under suc-
cessive applications of T is mixed into the entire, space &
With uniform density* In general S^R becomes, a region con-
sisting of a large number of thin i filaments spread through-
out the region..' As n increases the filaments become finer
and their density more nearly constant* v • v
An example of a mixing transf ormation is shown in
Fig. 21. Here measure is identified with Euclidean area. '
The spaoe is the 'triengle and tNp is the print \ units ■ «f
distance ab^ve point P providing this does n*>t g^ outside
the triangle* When the top of the triangle is renched a
point is transferred first to the point directly beneath,
and then over to the right en irrational fraction of the
base width. If this carries the point beyond the right edge
- 100 -
the extra distance is mersured from the left edge. -Successive
transforms of b square region ere shown in Fig. 21. For \
ve,ry lrrge the squar-. is turned into q uniform grating ot
nearly parallel thin strips covering the triangle.
A mixing transformation in this precise sense en
occur only in a spaee with on infinite number of points, for
in a finite point space the transf ormation must be periodic.
Speaking loosely, however, we can think of a mixing trans-
formation as one which distributes ?ny reasonably cohesive
region in the space fairly uniformly over the entire space.
If the first region could be described in simple terms, the
second would require very complex ones* In the case of
y~ cryptographic interest, the original region is all of a cer-
•.; tain simple statistical structure — after the mix the region
.< ' .is distributed and the structure diffused and confused*
. Go~d mixing transformations are often formed by re-
k. & " peated products of two simple non-commutating operations*.
. ' See for example the mixing of pastry dough discussed by Hopf.*
The dgugh is first rolled out into a thin slab,, then folded
over,- then' rolled, and then folded again, etc
In a good mixing transformation of a space with
natural coordinates X,, X2,. . *. ., Xg the point X. is carried
by the transformation into a point Xi, with
Xj^ ■*■ f ^ (X^ , Xg , • » » , , Xg ) i " 1 , 2 , * • • ,S
and the function* f, are complicated, involving all the
variables in a •"sensitive" way. A small variation of any one,
X3, say, changes all the XI considerably. If X„ passes throug
its range of possible variation the point XI traces a long
winding path around the space.
...
Various methods of mixing applicable to statistical
sequences of the type found in natural languages can be
-devised. One whioh lo ;ks fairly good is to follow a prelim-
inary transposition by a sequence of alternating substitutions
. '. ' J end simple linear operations, adding adjaoen^ letters mod 26
* for. example * • r ■. ..; >
Thus . >.-. '.
S*Jht r-'i- • • . • • ■ *' . . . -f i SJ rv-. - • '
H - L3ISLT ■ ; .
"where T is a transposition, X .is a linear operation* and S is
" ' - a substitution.
• .. .
*E. Hopf, On Causr-lity,. Statistics and Probability, Journol ol
. / Mrth* and Physios, V.13, pp. 51-102, 1934.
< v
i ■a
- 101 -
38. Ciphers of the Type 1\HS.
1 1
Suppose that H is r good mixing transformation *
can be applied to sequences of letters and thst T. find S.
any two simple families of t ran s formations , i.e., two J
ciphers 4 which may be the same.. For concreteness we m^y 1
of them as both simple substitutions..
It appears that the cipher THS.will be r very g:
ciphering system from the standpoint- of its work chnrnctei
In the first place it is clcr on reviewing our arguments
statistical methods that no simple statistics will give ir
tion about the key - any significant . statistics derived fr
must be of e highly involved end very sensitive type - the
dundpncy has been both diffused and- confused by the mixing
. . Also probable words led to e complex system of equations
Ing all parts of the key {when the mix is -good), which mu
.solved simultaneously,. The bad features of such a system
v v •• - :* propagation of errors and complexity of operations, both c
/ • V: which get worse ns the mixing of H gets better.
It is interesting to note that if the cipher T i
omitted the rempining system is similar to S nn1 thus no
stronger. The enemy merely "unmixes" the cryptogram by
, plication of H~l and then solves.. If S is omitted the re-
maining system is much stronger th*n T alone if the mix is
but still not comparable to THS.
The bnslc principle here of simple ciphers sepa
by a mixing transformation can of course be extended. For
example one could use
'S, ' TkHiSjH2Rl
«$& . . * - -, • . ' . >•*.»'«••
•• >«- ' JIth two mlxes and three simple ciphers., One can also sim
by using the same ciphers, and even the same keys (inner
product) ns well as the same fixing transformations* - This
• ;*jr.. might well simplify the mechanization of such systems^ "
••/, ■ The mixing transformation which separates the t\
> -N {or more) appearances of the key acts as a kind of . barrier
/>. ti;; J** enemy — it is easy to oarry a* known element over this
barrier but an unknown (the key) does not go easily,
«... .... , By supplying two sets of -unknowns, the key for £
the key for T, and separating them by the mixing transform'
H we have "tangled" the unknowns together in r way thrt m«V
solution very difficult,
Although systems constructed on this principle
wpuld be extremely safe they possess one grave disadvantage.
If the mix is good then the propagation of errors is b^d.
A transmission error of one letter v.ill affect several let-
ters on deciphering*
.
39. The C omi.o und V ige neVe
In the compound Vigenere severcl keys of length d.
<3gf ..* f dg are written under the message and added to it
modulo 26 to obtain the cryptogram, The 'result is 8 Vigenere
with key of special type,' -whose repetition is of period d „ the
least oommon multiple of cU, <5„, dg. If we h'-'ve three
keys of periods £, 3, 5 thl total period is 50 nod the total
key size (2+3+5) x 1,41 - 14,1 digits. The situation is then
M ' al ^ ^ m4 m5 m6 -
*
H ~\ a2 al aE al kZ
K2 - bx b2 b3 bx b2 b3
K3 - Cl C2 C3 C4 C5 Cl
E *" el e2 e3 e4 e5 e6
ith .
el * ^1 4 al + bl + cl
e2 " ml * a2 4 bl 4 c2
etc«
If we assume M nnd E known then, letting »= r m(
s V a. + b,. 0,-h, a, + b3 ♦ c, - h5
' ' " ' ' ■ + *2 * °2 " h2 Ql 4 bl 4 °2 • V .
Rl * b3 * c3 " h3 ' R2 * c3 ,r W
. . . Q2 * bl 4 °4 " *4 al + b3 4 C4 " b9
Ql + b2 + C5 * h5 C2 + bl + C5 " h10
These equations are easily solved for the key, although not as
easily as in the simple Vigenero or othor sinple ciphers. As
the number of constituent periods increases the solution be-
comes more involved and time consuming. In any case wo have
a system of simultaneous equations each involving S of the
s
total of B^dj^ unknowns. The unicity point will occur at abou
2B letters and if soveral tines this amount of material is in-
tercepted no groat difficulty, should be encountered in breakin
the cipher, providing S is not mora than say 6" or 8. With the
first 9 primes as periods we have a key size of 100 letters or
about 141 digits, the unicity distance is about 200 letters an
the key does not repeat for 223,092,870 letters. This systen,
although much better than such methods as simple substitution,
transposition and simple Vigenero with equivalent key size,'
does not utilize the available key fully in making the cryptV
analyst work for the solution. The equations only involve 3
of the B key unknowns and those in a simple fashion* The
equations easily oombine and reduce to eliminate unknowns. If
a large amount of material is available, compared to the unicii
distance, particular sets of equations can be combined to
eliminate unknowns very easily. The system possesses the inpo:
advantage, however, of not expanding errors. One incorrect
letter of cryptogram produces one incorrect letter of decipher*,
text.
..
By relatively simple changes this system could be
strengthened considerably. If tho equations for the key
elements (with M and E known) could be made into higher degree
equations rather than linear ones the difficulty of solution
would increase tremendously. This could easily be done in
a mechanical device by successive multiplications (Mod 26)
of tho key letters according to some prearranged schome,
*
40 » Incompatablllty of the Criteria for Good Systems
Tho five criteria for good socrccy systems given in
seot ion 12 appear to havo a certain inconpatability when ap- -
plied to a natural language with its complicated statistical
structure. With artificial languages having a simple statis-
tical structure it is 'possible to satisfy all requirements
♦simultaneously, by means of the ideal type ciphers. In natural
languages It seems that a compromise must bo made and tho
valuations balanced against one another with a view toward
the particular application.
If any one of the five criteria is '"roppec* , the
other four crn be s?itisfied fr.irly well, r.s the following
examples show.
1. If we omit the first requirement (amount of secrec
any simple cipher such os. simple substitution will
In the extreme case of omitting this condition com-
pletely, no cipher at fll is required end one send.
. the clef.ri
2. If the size of the key is not limited the Vernam
system can be used.
3. If complexity of operation is not limited., various
'•extremely complicated types of enciphering process
cen be used* The modified compound Vigenere descr
above with. many different periods compounded is f e :
satisfactory as an example here, although it falls
down somewhat on the key size condition. Ideal syf
"and enciphered codes are also frir examples althout
not too good from the propagation of error point o:
view.
4i If we omit the propagation of error condition syst
- of the type THS would be very good, although sonew:
complice tad.
5. If, we allow lr.rge expansion of message, vr.rious sy.-
are easily devised where the "correct" message is :
with many "incorrect" ones (misinf ormrtlon) . The \
determines which of these is correct.
• A rough argument for the incompatibility of the. :
conditions may be given as follows.
> ' '
■ ' '* : From condition 5, secrecy systems essentially a s
Studied In this paper must be used; i.e., no great use of r.
etci Perfect and ideal systems are excluded by condition c
rg^0&aMJHr 3 and 4, respectively. The high secrecy required- bj
>'^;"^^^flWi«'*th«n*TD<3tf» -£rm a high work characteristic, not from a
^ high equivocation. characteristic , If the key is small, the
> '_' ^..^f^-r^: system' simple, and the errors do not propagate^ probable wc
methods w 11}. generally solve the system fairly easily, sine
we then have a' fairly simple .-system of equations for the ke
This" reasoning is too vague to be conclusive, but
general idea seems quite reasonable. Perhaps if the varioi.
criteria could be given quantitative significance, some sot
an exchange equation could be found involving them and giv:
the best physically compatible sets of values. The two mo:
- t difficult to measure numerically are the complexity of opei
tions, end the complexity of statistical structure of the
• language . ,
■
Appendix 1
Deduction of - I pj log pi
It will be shown that the meusure of choice -
£ Pi. log Pi is a logical consequence of three quite reasone
assumptions about the desired properties of such a measure.
The three assumptions are:
V (1) There exists a function C(plt p2, pn)
uous in the p^, measuring the amount of "choice" when there
n possibilities with probabilities p^ ,
/•-. ' • .. ' . ' •
. <2) , C has the property that If a given choice be
broken aown into two successive choices the. total amount of
choice, is the weighted sum of the individual choices* . For
example, suppose the choice is from 4 possibilities A, B, C
with probabilities Yl, .2, «4U . .This can be broken down
a preliminary choice hetween.the pair A, B and the pair C,
Pair A, B has a total probability .1 + .2 « .3 and pair c,
probability .3 + .4 « .7. If pair A, B is chosen a second
between A and B must be made with probabilities -*1 « 1
.1 + .2 Z
42 2
V " If Pair c» D is chosen a second choice betwee
•* *
and D must be made with probabilities ^ and * , Thus brok
down we have a preliminary amount of choice C (.3, ,7) end
of the time a secondary choice of c (± f 2 j while .7 of th
time the secondary choice is C (2 . Our condition req
that the total choice C (.1, .2, -3, t4) be the same as the
, weighted sum of the different choices when decomposed, weig
in accordance with the frequency of occurrence. Thus we re
in this case C ,2, .3, .4) « C (.3, .7) + ,3.C (- , - )
;f^^!-, If .A(n) ? c (I #. i,.!*.*. .» the choice
when there are n equally likely possibilities, then A (n) i;
monotdnio Increasing in n. i .
Theoreaj . Under these three assumptions
(•■••» - - • _
C (PI, P2, , Pn).88 - K£ Pi log pi .
where K is a positive constant.
- 106 -
From condition (2) we can decompose a choice from equall;
likely possibilities into a series of m choices each from s
equally likely possibilities and obtain
A (S111) ■ m A(s)
Similarly
;. (tn) - n A(t)
We can choose n arbitrarily large and find an m to satisfy
S*< t*< S01 ■* 1
Thus, taking logarithms and dividing by n log S,
5 £ < log t V _m + ±
'"log s- . , « j st lSTs.|-< e
where* is arbitrarily small*
Now from the monotonic property of A(n)
A(SP) < A(tn) < AO* + 1)
m a(s) < nA(t) < (m + 1) A(S)
Hence, dividing by nA(S),
m s t ) m 1
n — MS) — n b
• - m \k"
- I < 2 e A{t) • -K log t
"{BY log S I *~
where K must be positive to setisfy (3),
Now suppose we have a choice from n possibilities with comme
surable probabilities p^ * where the are integers*
can break down a choice f rom £n4 possibilities into a choice
f roa possibilities Tvith probabilities pi* »>pn and then,, if
the ith was chosen,, a choice from ni with equal probabilitie
Using condition 2 again, wef equate the total choice from £ni
as computed by two methods
K log Eni - c (pi-, , Pn) + K£ Pi log nj_
- 107 -
Hence
C - K [E pi log I ni " E pi log ni]
■ * K 2 pi log -SL « -K £ Pi log pi
If the pi are incommeasureble, they-may be approximated by
rationale and the same expression must hold by our continuity,
mce and amounts to the
choice of a unit of meesure,
m
/in
i
- 108 - srfsrr
Appendix 2
proof of Theorem 4
Select any message Mi and group together all crypto-
grams that can be obtained from Mi by an enciphering operation
Ti# Let this class of cryptograms be c{. Group with Mi all
Mg that can be obtained from Mi by Tj^TjMlf and call this class
Ox* The same ci would" be obtained if we started with any other
M in Ci since : ";.\. •'
• - - : ; ■ I i . if, & TsTj^ki % : %iUmm.. ' . ■
.2.,: , ; • . •;. ^^aj^;1^-"
Similarly the same Ci would be obtained; :>r >
- *
Choosing &n M*.flf any exist) not , in Ci.we construct i-
G2 and Ce in the same way* .'Thus ^We obtain the residue* classy
with properties (1) and (2). Let Mi and M2 be in Ci and suppose
M2 - T2 Ti-1 Mi
■
If El is in Ci and oen be obtained from Mi by
Ei - \ Ux -Tp Mx - Mlr
then
El * ^ T2 Tl M2 " Tp T2X Tl M2 " ♦ m '
»*
" ^ M2 - ^ «2
Thus each Mi in Ci transforms into Ei by the same number of keys.
Similarly each Ei in c{ is obtained from any M in Ci by the same
number of keys. It follows that this .number of keys is a divisor
k ' , . of the total number of key* and hence we have properties' (3) and . ..
.. * ^- o< *
. . - •••• • I...
... ,* S6*r* . 4.:? *
" ; 1* •.
. i ' .— .4 „•
109 -
^nnNTTTPnTiT
x 3
Equivocation of Message for Random Cipher
As before let Mi ... Ms be high probability mes
and Ms+l ••«» Mu have zero probability. Let P(mi, m) be
probability of just mi lines going from a particular E, s
to a particular high probability M, say Mi, with a total
lines to all high probability M. Then
...
.-..!-■ ft
_,„ (k) (m) (i)»l (s;i)"i-i»1(1.s)
The probability of intercepting an E with m lines t
bility M's la:^ >
k-n
' ■ -
The Q(M) expected can be thought of as contributed to by
various Mi .in the high probability group. Thus Ml contri
. mi mi , m
- log — = ■ —i log —
m xue m m 6 mi
if there are mi lines to Mi and a total of m to high pro^
M's. The expected Q is then
(MM) - a S miEm PCj.m) §j SL log S_
The factor H sums over the various Ei and the S sums ovei
different Ml,(i, l>t s) • Hence,
Q(M) - I £ P(mi,m) mi [ log m * log mj
the term y
i - v.- ■ ,. ■
V
E P (mi,m) mx
summed on mi* gives the expected mi, when m lines^go to h
probability. Mgt 1*©,, m/a, Henoath'e first term is
• •* * •»:.-> fx*. ■*'■';
JL £ m P (m) log m * Q(K)
m
by our previous work. The second term is
• JSP (mj., m) mi log mi
If the expected mi is «1 this term is small since it vanishes
for mi ■ 0 or 1. The expected mi is k/H» Thus beyond this
point Q,(M) approaches closely to Q,(K) • The point in question
is where JK| • |Mpf - RqN •
or
IK
If the expected »1 the log mi can be taken out as log Hi «*
log k/Hi and we have' , - :
log =y £ P>j
' ' ^ -log § - }Mo1 r .|K!:^-r •
In' this "region then • - V " '. ' ; "y
Q(1C) • |M0| - id + d(K)
but here Q(K) - ]k| - |M0| + : • Jill, and therefore
q(M) - |m[ - RN . - '
In the transition region Ei is about 1 and Iff will in
ordinary cases be very large. It is admissable then to replace
?(mi; m) by P(mi) , since this will not depend on m to any extent
except for values of m of very small probability. Thus we obtain
for this region
iiU) - - 3 £ p(mi) mi log
The "sum has the same "form as our expression for Q{K) but with
l/H In place of s/H» The calculations for Q(K) can be used,
therefore % with only a change of '< the^U scale byja factor of
. '•' ' '"• ^>-"~" ^"'ft *" •' ' i. ' J}'*'
- Ill -
. .,"■■»
v- ■
Appendix 4
Key Appearance in Simple substitution with Independent Le-
If successive letters are chosen independently e
the different ' letters have probabilities Pi P2 Ps» we
calculate the expected number of different letters when N
letters have been intercepted. ; It is,.
:,^,L, ,i IW - s - e (l - Pi)N ;
t
To prove thi*« * iiaklte«iri^'*^Klbl« sequences of N le
written down, each wifch'^a frequency corresponding to its ]
bility, giving a total ^of aay A sequences*.. Letter 1 does
appear in (1 * Pi)N A of thesej letter E does not appear i
(1 - P2)N A etc. Therefore/ "the total number of letters r
from sequences is
AMI" Pi)N
Dividing by A gives us by definition the expected number t
missing letters from a random sequence, E(l - p«)N, rphe j
of different letters expected in a sequence is the total :
of letters S minus this, giving the desired result.
If all the pj. are equal this reduces to S - S(l
ah exponential approach to S« In the general case there i
series of exponentials with different time constants, cor:
sponding to different p^, which are added to give «L(N).
With the frequencies of normal English used for
p^t we' obtain the curve shown in Fig* 25, along with ah e:
mental ourve. The small discrepancy can be attributed to
influences of nearby letters* (IaJBnglish- there is less tc
-to double letters than there would be if the letters were
pendent but" with' the same probabilities. For English the
.bility of a doubled diagram is , ^
i*K.'«Mu • . ••' •- • ■ -k. J: .. * h'S , "
r^y 'i'^i*^^- *->.. \v. £ P(i* i) " • 0315
. * while if letters were independent it would be v
.-. ^ - » -,:■■■:*■;{ p ■ ; ■ - * *. • •> • ' - -• U.
E pj * ,0670.
.appendix 5
A Theoretical Case Where All Invariant Statistics of E Are
Independent of K.
By an invariant statistic of e sequence of letters
S »',».., m_2 niQ m^ m2 • m3 , we will mean r statistic
which is averaged along the length of the sequence E» More
precisely a statistic of the form:,
Lim i — (F(E_b)*-»- ♦+ F(E„i)+r{E) ♦ F(Et) + F (E2J+...+ F(En)
n -co (2n+l) ( ^ —
.... , . ■ ' . 4 * ".' ■ ■ ... . • ■ -Vi?, :
' '■■ .' . , * , ... " ' • ,. . " . - _ ••
where F is any function whose argument Is a possible sequence , and
E±a is the sequence E shifted N letters to the right -or loft.
Such statistics as the relative frequency of a given letter, of,
a given n-gram, transition frequencies, and frequencies with
whioh letter i is followed by letter i at e distance n are all
invariant.
• •• •
We will describe a system in which every invariant
statistic which the cryptanelyst can construct from the (infinte)
intercepted E is independent of both K and M, and thus gives no
information to him. This effect and still more occurs with the
ideal ciphers of course, but here it is obtained independently of
the original message statistics and without any matching of the
cipher to the language.
Let N be a "random" sequence of letters;
N * »•» n_2 n-i n0 n^ n2 us ...
this is supposedly a known sequenoe (to the enemy) and thus a
part of the system, not of the key. Apply eny simple cipher to
the message and then add N letter by letter to the result {mod
B6)« The ♦•sum'* is the enciphered message* 'it is evident that
any Invariant statistic oa S will be (with probability 1) -the
same.es that for a rendom sequence* Hence it is Independent
of both K and M» ; x •
We need hardly add that such a system is easily
broken ~the enemy merely subtracts N from E and then solves
the simple residual cipher* which 'may often be done with
invariant statistics, >
Appendix 6
Maximum Repetition Rate in Compound Systems for a Given To-
We consider briefly the question of how to arran-
component periods in a compound Vigene're or Transposition i
to obtain the longest period for a given total key size,
component periods are Px, P2,/t*» Sg JLt is clear that they
b'e co prime. Otherwise the total key, which is LPif could \
duoed without changing the period, which is the least comm;
multiple of the Pi, merely by deleting a factor which appet
several o'f. the P^ from all but one/ Also each p must be e
of a prime, for if it contains two primes, it can be divide
these parts, reducing the key and not affecting the period,
the component periods are selections from the series of pri
and powers of prime sj . .
4& 2„ 3, 4, 5, 7, 8, 9\ )^:XZ4?m:i7'f, 19, 23,. 25,. 27,
the seleotion being pairwise ooprimeV
It appears from empirical evidence that the best
of component periods, for a given total size S is found by t
following process,
1. Determine the largest M such that Ipj<S where the
are the primes in increasing order^ This is the
maximum number of periods where the periods are c
prime, end is the number of periods to be used.
2. Choose from the sequence A, M elements, consecuti
except for the fact that no prime is represented
than once, the M elements being as great as possi
with aum <S#
3. If the aum is <s move as many as possible of the
elements in this block up -a notch in the sequence
v still satisfying .the conditions .on the sum and co
' ■ mality , ■ : i r •'
4. Repeat 3 to either part of the original block if
, , * :." sible •*• "This process eventually ends and apparent
gives', the proper decomposition*
■ ; *-':~>!'":
r-?. For example with 8 » 50^ the .sum of the first
primes is 41, of the first 7 is 58. Hence 6 peri
will be used. We .have
• • 11 + 9, + 8+ 7+ £ + 3w43
13 + 11 +9 + 8 + ^7 + 5 * 53
hence we start with the block 11, 9, 8. 7 5 3
to6givl * elemants 11» 9» 8' 7.can be up a
13+ 11 +9+8+5+3-49
Nj further improvement seems possible, we obtain
F- 13X 11 x 9 x 8x 8 x 3 * 154, 440
The products and sums of the first n prime's are given below
n 1 £ 3 4 5 ... 6 7 8
pn , 2 3 5 7 11 13 17 19
Sum 2 ■ 5 10 17 28 " , 41 * 58 77
Product 2 6 30 210 2310 30030 510510 9699590' 22309!
C. E. SHANNON
Att.
Figures .1-25.
■
ENEMY
CRYPTANALYST
E
MESSAGE
SOURCE
ME55AG
M
ENCIPHERER
T.
CRYPTOGRAM
DECIPHERER
MESSAGE
V
M
KEY
K
KEY
SOURCE
KEY K
FIG. 6
* >-
—
T"1
FIG. 8
ME SSAGE
RESIDUE
CLASSES
M
M
CRYPTOGRAM
RESIDUE
CLASSES
Cj
M,
C3 [ M7
] c;
PURE SYSTEM
FIG. 10
CALCULATION OF Q CURVES
FIG. 16
N
FIG. 19
CG^RD^OL
STRONGLY IDEAL Q- \*\
N - NUMBER OF LETTERS
IDEAL CHARACTERISTICS
FIG. 20
FIG. 2 2
FIG. 23
September 19 , l*4&-ll£S-CX3-yO
Introduction.
la elasaioel ae&aanios one considers situations
where the state of a syatoa is described bj i Mt of numbers,
tie coordinated of the phaae space of the system, and the
dynamical behavior la controlled by a eat of ordinary differ-
antlal equations. Suca a ays tea is entirely determinate; the
future ia completely apeolfiad by toe preaent state aad the
dynamical equations, alnoe these differential equations have,
ia general, a unique eolation peas lag through a gives point.
In other branches of physics (host flow, brown! an
motion, diffusion etc) there are situations which saa ha called
completely statistical* The path of a particle of gas la
described only statistically aad no/ determinate or mesa behsrior
ocoars. In this case oae studies the flow of probability which
ia described by a partial differential equation of the heat
flow typo.
the present stomoraadnm J I sens sea a partial diff area-
tlal equation ia which both effects occur— there is a definite
•mean" motion of a system determinate ia character, carrying
its rcpresentatlTC point through phase space la the classical
manner with a superimposed statistical effect continually per-
turbing it from this path.
• a -
2a suoa a mm toe futars coordinates of tbs aysteas
•uuot bo precisely predicted; oaly « probability distributioa
fuaoUoa oaa be deterained for tha future tiae aaose *alae
times tli« volww eleaeat dT is tae probability tbet tae ayatea
will m la ibt wolaa* eleaent dr around tae poiat la question.
For a snort tlaa tne ays tea is substantially deteralnata , tbs
dlatribatloa being concentrated around a point whleb morm* ao-
aordlau to tae determinate part of tae equation. As tba statis-
tical off acta ooaa into play this distribution broadens oat aad
la general approaabea a Halting distributioa anion ia indepen-
dent of tbe initial atato of tbs systeau
Xa eoac rasps ota taa situation ia stalls* to tbet la
quantua aeebaalsa, wbere aysteas are dsseribad only by probnbili-
tiea (or wore praaisaiy by wm foaatlons whose squared aaplitudas
ara probabilities*. Tbara is tais difference howeTcr; ia quantum
mechanics area tae initial state aaaaot be preoiaely deseribed
due to tbs aaeertaiaty priaeiple. Coajaeate ▼eriablea aaaaot
both be measured elaultaaeousiy vita exactness. Za tae aysteas
we consider Hera there are asaaaed to be no dlffioulUes of this
aeture— all ooor dins tae aaa be aiaaltaaeoualr aad preeiaely
measured, tais eorrespoads to tae differ ease la tae fundamental
equation from that of qusataa Aeehsaioe~Sebm,edlagoits equation is
for the wave fuaotion * , walla tae equation considered bare deals
directly «itfc tae probability density, mas the present work: is
adapted to "ifolar" statistical situations.
Ihln sort of analysis any *>* expected to apply to
many pr obi eat where the actual situation Is quits explicated
but a partial theoretical aaalysic is possible, this partial an-
alysis Is used for the determinate part of tbs c;u»tioa, and
the other complex disturbing effects treated statistically,
each situstions may occur la economics, sociology, history, eta.
as veil as in many engineering and physios J. problems.
G. S. Stlbits la a series of meaoraada bas considered
a similar problem la aonaeotioa with the stability of a periodically
closed servo ays tea. la ale case the phase space of the system
oonslsted of a sat of discrete points, and uie fundamental
equation is a difference equation, la the case considered here
(which was suggested by Stlbits* eora) the variables are continuous
and a differential equation is involved. S
Xa a Aataraiaate *ja\*m aita aa a dlaaaaloaai paaaa
OMi, nacaa aotioa la iMtriM bar diffaroatial asuatioaa, *• aa*a
jgi • fYu\ **, .... **) 1 * X#* a <D
vbara taa x* ara ©oordLoate* la taa paaaa apaea *ad t ia tin*.
If aa a tart wita * probability diatributioa of poiat* ia paaoa apaoa
.... **, t)
giving taa probability daaalty ia tsa differ aatiai rain** «lta«at
about at1. .... a* at tiaa t, taia dlatributfcm cfaaa«f>a adta tin*.
■ *
lt» utloa la 4»»orll>»a b» tM ftrUH 41ff«r«sU«i •}u»Uoa
or ia taaaor aotatioa
/
Taia ia oTidoat If »• taia* of ? aa a fluid daaaity uaoaa Yaloaity
flald ia f4.
So* auppoaa taat aa t&* raaraaeautiva poiat of too
ayataa aovaa about taa pftaao apaaa it ia ooatinaaily aubjaat to
aaOl dlatorb&aeaa, walah ar« of a probability ty?a« tlaia taa
ayataa taada to folio* taa aoluUoa of (1) but ie aoatiaaally
balac dlaturbad by taa probability affeota, walca amy bo taouaat
of aa aoaathlag liJca aolaaular aoUiaioaa of taa aurrouadia* ama
m % m
oa a aorta* partlelo. *o art Ui«rtitt4 la taa lioltla* •*»•
abort taa dltturbiat; tffoota are wp rapid tout T*rj aaall. If
we eeeuao that taa &ata*aeaee 1* aa»o«taeottt aaa Isotx-oplt,
tfela eta bt rtpreeeate* ay as afldltloaal tara la taa equation of
tao aeet flow typo
K?*r\
Za tao aort gen*?el oaoo ear tela dlreetloa* 007 00 jr of erred, aad
oortalo reslona may aave ereattr partarbatloa effaote« taus taere
•111 generally b« * esaU ellpasld of probability about oaoa point.
aa4 o oorroopoflcioa poeltlve aefiaite ejiadrntio for*
defined erer toe paa*e apeee* Tbli form deeerlbee tao Xoeal
•tetletleal perturbine effeets, for eeea point,
tao equation tata enauaee tao form
Talt partial differential eonetioa «©wae tao flo* of probability
la tao panee tpeee, Utb oa eaeeable of eyatene dlatribated at
t m 0 aoooraUa to F0(al)
tao attribution at a la tar tlao t^ la tao eolation of (1) for
Tao equation (1) la llaoar aad of parabulia typo (la t).
In taa x* it le elliptleel, aiaea a1^ la fOaltlra definite.
m % m
Tao total .robubiUtj la tU jftaao 0j*«* *«asia o^staai, for if
vt lot
/ (a1* 5^ ♦ *« • «
tfco latogral boia* ow o * xffUi*aUy Xar*o oarfaoo, ud ^ t&o
volt awaalt
Xf a1* to aosltivo oafiaito «o4 oota a1** aa*
ar« ooatUwotui la tao aaaao aaaoo turn 4iatri»«tioa v approaaM
a ual$*o Halt as t HMK ma Halt la alia«r s«o owr*a«*ot
tao pNfesalUty JOtaroaUa* to Uf laltf o* a «o*iatt« Uaitiag 4i#-
tritouoa r* alta .
CM
ft* aay %•
f*a iiaitiaa alatritottloa am*t aatlofjr tao olU#tioal
ofuatloa ottaiaoa ay oottla* || • 0,
To nuom tact the aiitrihution epproaohea a Halt let
P1 and ?g ee two different solution* of ID. Titea the dif-
ference o, - ?A - P^ al«o satiafia* the equation aad ^ la
poaltive la oaa region B and negative la tae raaaladar at tae
apace. Consider tae cuani-ity
U auat deer ease for
where S la tae surface of tae reeioa B aad T la tae outward
Telooity of tale ear face. Since Q vanishes an the surface, tae
aeooad tern la aero, aad tae first la
Toluae iategrale of diYeraaaeea aad traaafora aj tae
i
usual theorems lato surface integrale
V
tae aeooad tera age la vanishes alace Q - 0 on S. la tae first
term «A la la tae direction of ^ a© at any point we have
< 0
Tims a aj initial distribution
?a «4 ?j H dearaaaia«.
•BprMMM t*» MM Xiait.
i
• I I*
It «^ is SeuiUiMOOS, *ftt tots ft <U»«aatHuiUyt
PwiH b#> o&u lienors, sad tfcs ▼sotor SUE ftl— aa i t— tsassj »
Ths saouat of tiiia di««oatiault/ Is £U «& fcy
ft1* - ?j) • - If* - ?*) »
*frtr« tht b***sd «a4 uafcsjrr »d l«n«r* ***** ts> ti»« two tide*
of t&« dltesoiiiuUt/. Tims
SMMyiftlsai Aft Mm *»a i1£m o# s*sft i 1 nana** ****** g>gj -
Xft tSM sUpisst Oft« &l»«ASiS*%l •*»* wft fcm
If wo «tort with ft «opiko* of prooaoilitr ioaaUaoa
at oao point, ta« I— tllato aoaowiar aaa bo aaaarlaoa la oittjOo
tor a*, aoar talt poUt wa **r ohaaao a1* aad f1 to bo aoaotaat.
Do» to tao f1 tao aolxo otartt «crln« vita a ▼•lojUy/*, 9111141
too pro»«oUltr tors a1* •pr«*de it out. If wo oottt wUtt«i
fro* af to
wo aooo -
* ' „ „. - "'
aod too •quatioa boaoaoa
taio ia tha o^uatioa far aoat flaw la aa aaiootropla Bodlua.
Thai ia ftao y* aooraiooto too «»i*o dlffaooa out lata a mwu&m
al»%rlb*tlaa *ita qoaArotU form a**| for th« firot afcort iatorroi
of tiaa
waoro A. « it tao laroroa fora of a1*
feliaa Toioauy rial* gaj aom^aaaaoaa at*u«ti«ai .wta.
Om portioalo? mm of la tor tot 1* ttei la w&iaa
is tUo opooo. ?at a oao a&aooslaaal aaaaa opoco,tfeo a$uatlaa U)
taaa aaoaaao ta« faxa
A coaoxal solution far tola o*§o &«s *soa foa&u It a*? *o dosaria aa
*a mxoi>a* It wae laltlol 41*t*iteatl©a i» a s foactioa, aa taa
sjrataa (or 0^aeabJL«) ia fcaooo to aaaa a daflalta talus at x at
t * 0, say P$ taaa at \± taa diatribe Uoa is aoraal* ?ao saatax
aM^a aa^MP ^^^W^ft^^rd IsV^^^aa^aV^^Oj^ ^9 s^-$ jjj^L^WW^
Taus taa attn £ oaroaaas alaaa, taa ium suits aa taa aystoa aaaid
follow am taa atatiattaal sff oata aasaat* Hm tarlaaaa a*
iaoraaaoa axyaaaatiaUy to a Ualtia* taiaa a/a aita aalf taa tlaa
to ay ova taat taia la taa aalatlaa it la oaly aaaosaojy
to saastitats la taa oqoatiea (*) , k* t —a* too tiatrisatloa
approaafcaa a normal aao saatarad aa ««ro ultn a* « a/a*
M • |U - of*)
«* » $ (1 • O****)
«iu oa oroitrarr iaitioi aiotritaUoo ?aU) too oolottoo ono bo
written *• ma mte*r&l ««lo« U&* aotooo of lu^iiUm of keo*
flow 9robl«gt»«
• / **m *
foe eeoe teaerol rooolto aoX4 la toe I aiooaeioaal
I*hi wh*$i it i ltft»»y fere *&d e^ 1a eooetnat* A *OollEO#
of probability eroo&eaa iAte o oorool Aletrleotleo* toe ooefte*
folio* la* tfit dtlsrslMU trejeetery oad toe qooArotlO for*
vfeloh tekeo toe jtliot of the etaoaor4 eOriatloo toMMNOooi eat*
oeoeatioUy towt o eef Ulte limit. *ae eveloeties of too
e one tea to io obob aero eoopUeat** 1* tale eeoe oeeew, ftoe
eeootlooe for too fiaal aietrloaUoe oro *i*eo io too ejeeodis«
Xt la t&t oao Alaoaoloaal llaaa* aaoo «• rtwt alta a
aoxaal 4lat*l*atioa aaatoroa oa ao*o aita a* • £ , tao distriOuUe*
hm am ttftttjr alta t&« Xoxm. Aa io&iTi&ual oyttoa oxoaotaa
•totlotioal aoUaa aooot aoro aaa tao oaaaablo of »jst*m* prodoooo
aa oaaoaalo of tiao oarloa. Tail mmiU* aaa b« oooa to ao
oaultaloat to taoraal aoiao waiea aaa oooa p*»»ed tirou^a a t Utor
with troa»f«r aaaxaoterlotla
loa&lag to a po»or opaotrua for ta* aoloo
To aaow tola, tao aatoaoxrolatioa aa/ oa o*icul*t«a, Urotoaa
vaooo vaXuo at t • 0 !• P aato a aoraai distribution oaatataA
m * t^ ia
Aiotriootioa at t * 4 la aoeraal vita a§ • J .
aaA tala ia too autoo jxrolatioa.
too power apootnta la tao laavia* taraaafon at aula
M
mil
cystic ^^^^^^^^oa -^x .-^n..
4
ft • JLfftf*} ft ♦ *(*) F)
#% OX 9*
mix) t 0. la **• »t4»4y «t*t«
*UJ f* ♦ *(x) * • 0
twadBi ?, 0 «* x «*» ± • * o
*U) 1 fix) p • o
I * 1 1*1
A 1» A«t«ralA*4 V *&• •o&AiUaa |p
ttmMi it is *•*•*»•*? /tlx) ii
fix) »> •
f (x) • x< •
-4.
• IS*
»t obt&U ft* **• ma •tatloattry oolutioa
•V1* - ' .
^ s-*M
- .«
of «x?oa«aUftl« 6««?«ftftl&£ lot»4 * «.
*&6 I* wtwttsl
fte satisfy dp • o »• tfc*
this v««>1ym
•a* *1m»
DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL SYSTEMS
By R. R. Rlackman, H. W. Rode, and
C. E. Shannon ■
THE problem of data smoothing in fire con- distant airplanes. Suppose, for example, that
trol arises because observations of target in observing the target's position we make two
positions are never completely accurate. If the errors of opposite sign and a second apart, of
target is located by radar, for example, we may 25 yards each. Then the apparent motion of
expect errors in range running from perhaps the airplane is in error by 50 yards per second.
10 to 50 yards in typical cases. Angular errors Since the time of flight of an antiaircraft shell
may vary from perhaps one to several mils,
corresponding at representative ranges, to
yardage errors about equal to those mentioned
for range. Similar figures might be cited for
the errors involved in optical tracking by vari-
ous devices. Evidently these errors in observa-
tion will generate corresponding errors in the
final aiming orders delivered by the fire-control
system.
A data-smoothing device is a means for mini-
mizing the consequences of observational er-
rors by, in effect, averaging the results of ob-
servations taken over a period of time. The
simplest example of data smoothing is fur-
nished by artillery fire at a fixed land target.
Here the principal parameter is the range to
the target. While individual determinations of
the range may be somewhat in error, a reliable
in reaching its target may be as high as 80
seconds or more, such an error might produce
a miss of the order of 1 mile. It is clear that
in any comparable situation the effect of ob-
servational errors in determining the target
rate will be much greater than the position er-
ror alone would suggest, and the function of
the data-smoothing network in averaging the
data so that even moderately reliable rates can
be obtained as a basis for prediction becomes
a critically important one.
Aside from magnifying the consequences of
small errors in target position, the motion of
the target complicates the data-smoothing
problem in two other respects. The first is the
fact that it gives us only a brief time in which
to obtain suitable firing orders. The total en-
gagement is likely to last for only a brief time,
estimate can ordinarily be obtained by taking and in any case it is necessary to make use of
the simple average of a number of such ob
servations. This example, however, is scarcely
a representative one for problems in data
smoothing generally. The errors involved are
small and the averaging process is an elemen-
tary one. Moreover, the data-smoothing proc-
ess is not of very decisive importance in any
the data before the target has time to do some-
thing different. Thus the averaging process
cannot take too long. The second complication
results from the fact that the true target posi-
tion is an unknown function of time rather
than a mere constant. Thus many more possi-
bilities are open than would be the case with
case, since any errors which may exist in the fixed targets, and the problem of averaging
estimated range can normally be wiped out
merely by observing the results of a few trial
shots.
More representative problems in data
smoothing arise when we deal with a moving
target. In this case errors in observational
data may be much more serious, since they
determine not only the present position of the
target but also the rates used in calculating
how much the target will move during the time
it takes the projectile to reach it. An illustra-
tion is furnished by antiaircraft fire against
• Bell Telephone Laboratories.
to remove the effects of small errors is cor-
respondingly more elusive.
The intimate relation between data smooth-
ing and target mobility explains why the data-
smoothing problem is relatively new in war-
fare. The problem emerged as a serious one
only recently, with the introduction of new and
highly mobile military devices. The airplane is,
of course, the archetype of such mobile instru-
ments, and we have already mentioned the
data-smoothing problem as it appears in anti-
aircraft fire. Since the relative velocity of air-
plane and ground is the same whether we sta-
tion ourselves on one or the other, however, the
71
72
DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL
mobility of the airplane produces essentially
the same sort of problem in the design of bomb-
sights also. Another field exists in plane-to-
plane gunnery. Although they are somewhat
slower, the mobility of such vehicles as tanks
and torpedo boats is still considerable enough
to create a serious problem here also. Future
examples may be centered largely on robot
missiles. It is interesting to notice that a
guided missile may present a problem in data
smoothing either because it belongs to the
enemy, and is therefore something to shoot at,
or because it belongs to us, and requires
smoothing to correct errors in the data which
it uses for guidance. The tendency to higher
and higher speeds in all these devices must
evidently mean that fire control generally, and
data smoothing as one aspect of fire control,
must become more and more important, unless
war making can be ended.
Very mobile instruments of war, such as
the airplane, began to make their appearance
in World War I, but there was insufficient time
during that war to make much progress with
the fire-control problems which such instru-
mentalities imply. In the interval between
World War I and World War II, however, a
considerable number of fire-control devices,
such as bombsights and antiaircraft compu-
ters, were developed. The principal attention
in the design of these devices, however, was
on the kinematical aspects of the situation.
Although a number of them included fairly
successful methods of minimizing the effects of
observational errors,b it seems fair to say that
in the interval between the two wars there
was no general appreciation of the existence of
the data-smoothing problem as such.
It follows that the theory of data smoothing
advanced in this monograph is the result prin-
cipally of experience gained in World War II.
More specifically, it is the product of the ex-
* Most of these solutions depended upon the use of
special types of tracking systems. Examples are found
in the use of regenerative tracking in bombsights and
antiaircraft computers or in the determination of rates
from a precessing gyroscope or an aided laying mech-
anism in an antiaircraft tracking head. So far as their
effect on the data-smoothing characteristics of the
overall circuit is concerned, these devices are equiva-
lent to simple types of smoothing networks inserted
directly in the prediction system. This is discussed in
more detail under the heading "Exponential Smooth-
ing," Section 10.1.
perience of the authors with a series of proj-
ects, largely sponsored by Division 7 of NDRC,
concerned with the design of electrical antiair-
craft directors. In addition, it draws largely
on the results of a number of other investiga-
tions, also NDRC sponsored. The possible key
importance of data smoothing in the design of
fire-control systems was recognized by Division
7 early in the course of its activities and the
emphasis placed upon it in a number cf proj-
ects led to the accumulation of a much larger
body of results than nJght otherwise have been
obtained.
Data smoothing is developed here in terms
of concepts familiar in communication engi-
neering. This is a natural approach since data
smoothing is evidently a special case of the
transmission, manipulation, and utilization of
intelligence. The other principal, and perhaps
still more fundamental, approach to data
smoothing is to regard it as a problem in sta-
tistics. This is the line followed in the classic
work1 by Norbert Wiener/ For reasons which
are brought out later, Wiener's theory is not
used in the present monograph as a basis for
the actual design of data-smoothing networks.
Because of its fundamental iaterest, however,
a sketch of Wiener's theory is included. The
authors' apologies are due for any mutilation
to the theory caused by the attempt to simplify
it and compress it into a brief space.
The present monograph falls roughly into
two dissimilar halves. The first half, consist-
ing of the first three or four chapters, includes
a discussion of the general theoretical founda-
tions of the data-smoothing problem, the best
established ways of approaching the prob-
lem, the assumptions they involve, and the
authors' judgment concerning the assumptions
which best fit the tactical facts. In this part
may also be included the last chapter, which
contains a fragmentary discussion of alterna-
tive data-smoothing possibilities lying outside
the main theoretical framework of the mono-
graph.
The rest of the monograph is concerned with
the technique of designing specific data-smooth-
ing structures. A fairly elaborate and detailed
treatment is given here, in the belief that the
• Wiener is also responsible for providing tools which
permit the gap between the statistical and communica-
tion point* of view to be bridged.
CONFIDENTIAL
DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL
73
problem of actually realizing a suitable data-
smoothing device is, in some ways at least,
as difficult as that of deciding what the general
properties of such a device should be. The
technique, as given, draws heavily upon the
highly developed resources of electric network
theory. For this reason the discussion is
couched entirely in electrical language, al-
though the authors realize, of course, that
equivalent nonelectrical solutions may exist.
For the benefit of readers who may not be
familiar with network theory, the monograph
includes an appendix summarizing the prin-
ciples most needed in the main text.
Two further remarks may be helpful in un-
derstanding the monograph. The first concerns
the relation between data smoothing and the
overall problem of prediction in a fire-control
circuit. These two are coupled together in the
title of the monograph, and it is clear that the
connection between them must be very close,
since, as we saw earlier, small irregularities in
input data are likely to be serious only as they
affect the extrapolation used to determine the
future position of a moving target. In the
statistical approach, in fact, data smoothing
and prediction are treated as a single problem
and a single device performs both operations.
In the attack which is treated at greatest
length in the monograph a certain distinction
between data smoothing and prediction can be
made. To simplify the exposition as much as
possible, the explicit discussion in the mono-
graph is directed principally at data smooth-
ing. This, however( is not intended to suggest
that there is any real cleavage between the
two problems or that the analysis as developed
in the monograph does not also bear, by impli-
cation, upon the prediction problem. Any the-
ory of data smoothing must rest ultimately
upon some hypothesis concerning the path of
the target, and the exact statement of the as-
sumptions to be made is in many ways the most
important as well as the most difficult part of
the problem. The same assumptions, however,
are also involved in the extrapolation to the
future position of the target. It is thus impos-
sible to solve the data-smoothing problem with-
out also implying what the general nature of
the prediction process will be. For example,
the formulation given in Chapter 9 amounts to
the assumption that the target path is specified
by a set of geometrical parameters correspond-
ing to components of velocity, acceleration, etc.
The data^smoothing process centers about the
problem of obtaining reliable values for these
parameters. To obtain a complete prediction
thereafter, it is merely necessary to multiply
the parameter values thus obtained by suitable
functions of time of flight and add the results
to the present position of the target.
The other general remark concerns the tacti-
cal criteria used in evaluating the performance
of a data-smoothing system. This turns out to
be one of the most important aspects of the
whole field. It is assumed here that the tactical
situation is similar to that of antiaircraft fire
against high-altitude bombers in World War
II. The defense can be regarded as successful if
only a fairly small fraction of the targets en-
gaged are destroyed. On the other hand, the
lethal radius of the antiaircraft shell is so small
that it is also quite difficult to score a kill.
Under these, circumstances we are interested
only in increasing the number of very well
aimed shots.
When we combine these assumptions with
the path assumptions described in Chapter 9
we are led to the data-smoothing solution for-
mulated here, in preference to the solution ob-
tained with the statistical approach. On the
other hand, we might equally well envisage a
situation in which the target contained an
atomic bomb or some other very destructive
agent, so that it becomes very important to
intercept it, while the lethal radius of the anti-
aircraft missile is correspondingly increased,
so that great accuracy is not needed for a kill.
In this situation our interest would be focused
on the problem of minimizing the probability
of making large misses, and the solution fur-
nished by the statistical approach would be ap-
proximately the best obtainable."1
" In fairness to the statistical solution it should be
pointed out that it is also the beat obtainable, without
regard to the lethal radius of the shell, if we replace
the path assumptions made in Chapter 9 by a "random
phase" assumption. The path assumptions in Chapter
9 are almost at the opposite pole from a random phase
assumption, and represent a deliberate overstatement,
made in order to illustrate the theoretical situation as
clearly as possible.
CONFIDENTIAL
Chapter 7
GENERAL FORMULATION OF THE DATA-SMOOTHING PROBLEM
ONE of the principal difficulties in any
treatment of data smoothing is that of
stating exactly what the problem is and what
criteria should be applied in judging when we
have a satisfactory solution. It is consequently
necessary to embark upon a rather extensive
general discussion of the data-smoothing prob-
lem before it is possible to consider specific
methods of designing data-smoothing struc-
tures. This preliminary survey will occupy
Chapters 7, 8, and 9. As a first step this chap-
ter will describe two of the general ways in
which the data-smoothing problem can be ap-
proached mathematically. The formulation of
the problem which is finally reached in Chap-
ter 9 is not the one which is most obviously
suggested by these approaches. This, however,
does not lessen their value in characterizing
the problem broadly.
7.1
A PHYSICAL ILLUSTRATION
In an actual fire-control system the data-
smoothing problem is usually made fairly spe-
cific because of the particular geometry
adopted in the computer. It may be helpful
to have some particular case in mind as a
touchstone in interpreting the general discus-
sion. For this purpose the most appropriate
example is furnished by long range land-based
antiaircraft fire, since most of the analysis
described in this monograph was developed
originally for its application to this problem.
It is usually assumed in the antiaircraft prob-
lem that the target flies in a straight line at
constant speed, and in one case at least the
computer operates by converting the input data
into Cartesian coordinates of target position
and differentiating these to find the rates of
travel in the several Cartesian directions.
These rates form the basis of the extrapolation.
The process is illustrated in Figure 1. The
input coordinates are transformed into elec-
trical voltages proportional to xP, y,., and zr,
the Cartesian coordinates of present position,
in the coordinate converter at the left of the
diagram. The extrapolation for * is shown
explicitly. It consists essentially in differen-
tiating to find the x component of target
velocity, multiplying the derivative by the time
of flight tf and adding the result to xP to find
15
(LEV
< AZIU
a*
COMDINA
CONVERTI
si
j 1
COOROI
CONVEI
FU2E
ELCV
»ZIU /
Figure 1. Dat
diction circuit.
xF, the predicted future value of x. A similar
procedure fixes yr and zr. After the addition
of certain ballistic corrections, these three co-
ordinates of future position are transformed
into gun aiming orders in the coordinate con-
verter shown at the right of the drawing. This
last unit also provides the time of flight re-
quired as a multiplier in the extrapolation.
The small irregularities in the input data
caused by tracking errors are greatly magni-
fied by the process of differentiation. It is thus
necessary to smooth the rates considerably if
a reliable extrapolation is to be secured. The
data-smoothing network for the x coordinate is
represented by JV, in Figure 1. Since the Car-
tesian velocity components are theoretically
constants if the assumption of a straight line
course at constant speed is correct, a data-
smoothing network in this computer must be
essentially an averaging device which gives
an appropriately weighted average of the fluc-
tuating instantaneous rate values fed to it. The
problem of "smoothing a constant" is given
special attention in Chapter 10. Aside from the
particular circuit of Figure 1, we may, of
course, be required to smooth a constant when-
ever the prediction is based upon an assumed
geometrical course involving one or more pa-
rameters which are isolated in the circuit.
CONFIDENTIAL
75
76
FORMULATION OF THE DATA-SMOOTHING PROBLEM
In addition to smoothing the rates we can,
if we like, attempt to smooth the irregularities
in present position also. A network to accom-
plish this purpose is indicated by the broken
line structure Na in Figure 1. Of course, in
dealing with the present position we are no
longer smoothing a constant, but suitable struc-
tures can be obtained by methods described
later. However, the effect of tracking errors in
the present position circuit is so much less than
it is in the rate circuit that N2 can generally
be omitted.
Geometrical assumptions of the sort implied
in Figure 1 are helpful in visualizing the prob-
lem, and they are of course of critical impor-
tance in determining what the final data-
smoothing device will be. It is important not
to make explicit assumptions of this kind too
early in the formal analysis, however, since
the meaning of such assumptions is one of the
aspects of the general problem which must be
investigated. For example, it is apparent that
no airplane in fact flies exactly a straight line,
nor flies a straight line for an indefinite period.
In detail, the solution of the data-smoothing
problem depends very largely on how we treat
these departures from the idealized straight
line path. For the present, consequently, it will
be assumed that the input data are presented
to the data-smoothing and predicting devices
in terms of some generalized coordinates, the
nature of which we wjll not inquire into too
closely. A given coordinate might, for example,
be a velocity, a radius of curvature, an angle of
dive or climb, or any other quantity which
would be directly useful in making a predic-
tion, or it might be a simple position coordi-
nate such as an azimuth or an altitude.
The data-smoothing and predicting opera-
tion itself is assumed to be performed by linear
invariable devices. Aside from the fact that
this assumption is, of course, a tremendously
simplifying one, it also fits the data-smoothing
problem very nicely, as the problem is formu-
lated in this chapter. With other formulations,
however, it appears that somewhat better re-
sults may be obtainable from variable devices
or devices including more or less radical
amounts of nonlinearity. These possibilities are
discussed briefly in Chapter 14.
72 DATA SMOOTHING AND
PREDICTION
Figure 1 illustrates a distinction between
two possible methods of looking at the data-
smoothing problem which it is advisable to
establish for future purposes. In describing the
x system in Figure 1 we laid emphasis on the
particular networks N, and Ns. It is clear, how-
ever, that the complete x circuit with input x,
and output xF is a network having overall
transmission properties which can be studied.
Since t, will normally vary with time, the net-
work is not, strictly speaking, an invariable
one, but the changes of t, are ordinarily too
slow to make this an essential consideration.
When it is necessary to make a distinction
between these points of view, a network such
as Nx, which is merely an element in the pre-
diction process, will be called a data-smoothing
structure. An overall circuit, providing data
smoothing and prediction in one step, will be
called a data-smoothing and prediction net-
work, or simply a prediction network. Al-
though these points of view have been illus-
trated for rectangular coordinates, they obvi-
ously apply also in many other situations. For
example, we might go so far as to apply the
overall point of view to a complete circuit from
input azimuth, say, to output azimuth.
Both points of view are taken from time to
time in the monograph. When possible, how-
ever, principal attention has been given to the
limited data-smoothing problem. This tends to
simplify the discussion, since the limited prob-
lem is evidently more concrete than the overall
prediction problem. Moreover, it permits us to
deal lightly with such questions as the particu-
lar choice of coordinates in which the smooth-
ing operations are conducted, since it assumes
that the general kinematical framework of pre-
diction has already been decided upon. On the
other hand, the overall point of view is more
effective in certain situations, and it is the only
natural one in the statistical treatment de-
scribed in the next section.
73 DATA SMOOTHING AS A PROBLEM
IN TIME SERIES
The most direct and perhaps the most gen-
eral approach to data smoothing consists in re-
CONFIDENTIAL
THE AUTOCORRELATION
77
garding it as a problem in time series. This
is the approach used by Wiener in his well-
known work.1 It essentially classifies data
smoothing and prediction as a branch of statis-
tics. The input data, in other words, are
thought of as constituting a series in time
similar to weather records, stock market prices,
production statistics, and the like. The well-
developed tools of statistics for the interpreta-
tion and extrapolation of such series are thus
made available for the data-smoothing and
prediction problem.
To formulate the problem in these terms,
let fit) represent the true value of one of the
coordinates of the target and let git) repre-
sent the observational error. Then fit) and
git) are both time series in the sense just
defined. The set of all such functions corre-
sponding to the various possible target courses
and tracking errors form an ensemble of time
series or a statistical population. One can im-
agine that a large number of particular func-
tions fit) and git) have been recorded, each
with a frequency proportional to its actual
frequency of occurrence. Wiener assumes that
they are stationary, that is, that the statistical
properties of the ensemble are independent of
the origin of time. This, of course, implies that
both functions exist from t = — co to i = + co .
We will sometimes find it more convenient to
make the assumption that the two functions
vanish after some fixed, but sufficiently remote,
points on the positive and negative real t axis.*
The input signal to the computer is of course
fit) + git). If we assume that the coordinate
in question represents a position, the quantity
we wish to obtain is fit + t,), where t, repre-
sents the prediction time. If the coordinate is
a rate, we are interested in an average value
of f(t) over the prediction interval. This com-
plicates the mathematics somewhat, but does
not essentially affect the situation.
» This is done for technical mathematical reasons. We
ahall later have occasion to consider the Fourier trans-
forms of f(t) and 0(f), and, to have well-defined trans-
forms, the integrals of the squares of the two func-
tions, from t - - co to t = + <o , should be finite. This
would not happen under the "stationary" assumption.
Wiener avoids the difficulty by introducing what he
calls a generalized harmonic analysis, but this method
is far too complicated to be treated in a brief sketch
like the present.
We shall not, of course, be able to predict
fit+tf) perfectly accurately. Let the pre-
dicted value be represented by f*it + t,). In
virtue of our assumption that the data-
smoothing and prediction circuit is to be a
linear invariable network, the relation between
f*{t •¥ t,) and the total input signal fit)
+git) can be written as
/*(< + </) = / \M + gi<r))dK(a) (1)
where dKia) represents the effect of the data-
smoothing and prediction circuit. Comparison
to equations (2) and (5) of Appendix A shows
that K is, in fact, the indicial admittance of
this circuit. The particular problem to be
solved is of course that of finding a shape for
the function Ki<r) which will make + t,)
the best possible estimate of fit + *f).
The fact that the upper limit of integration
in equation (1) is taken as a = 0 is particu-
larly to be noted. It corresponds to the fact that
in making a prediction we are entitled to use
only the input data which has accumulated up
to the prediction instant. This restriction will
be conspicuous in the next chapter, where the
time-series analysis is completed.
7 * THE AUTOCORRELATION
The principal statistical tool used in study-
ing equation (1) is the so-called autocorrela-
tion. Under the "stationary" assumption the
autocorrelation for fit) is defined by
*i(t) = g$*hf-T w*«w>*. (2)
We can obtain a normalized autocorrelation,
which is more convenient for some purposes,
by dividing by </>,(<>)• This gives
C f(l+r)fit)dt
, , \ <t>\ir) .. J-t
*"(t) = *m - Ay. ~r • «
J T 1/(0 J' dt
If we assume that fit) in fact vanishes for
sufficiently large positive or negative values of
t, the limit sign can be disregarded and e>lAr(T)
becomes simply
CONFIDENTIAL
78
0,v(r) - ffrj fit +T)f(t)dt (4)
( / (ty^dt and represents the total
"energy" in the time series f(t).
Precisely similar expressions can be set up
for the autocorrelation <f>2ir) or <j>2K(r) of the
observational error function git). In a gen-
eral case we might also have to worry about
a possible cross correlation between fit) and
g(t). This would be represented by a cross-
correlation function <£12(t), obtained by inte-
grating the product f(t + r)g(t). In practical
fire control, however, it can be assumed that
the correlation between target course and
tracking errors is small enough to be neglected.
As a simple example of the calculation of
an autocorrelation we may assume that f(t) =
sin wt. Then
1 CT
tf>i (t) = lim ;r=, I sin u(t + t) sin wt • dt
= lim 2? / ~ [cos wt — cos (2wt + wr)]d
- \ cos «*, (5)
since the term cos (2a>t + an-) will contribute
nothing in the limit.
The maximum value of (r) in (5) is found
at t = 0. This is to be expected, since ob-
viously the correlation between identical val-
ues of the function is the best possible. What
is exceptional about the present result is the
fact that <£,(t) is not small for all large t's.
This is fundamentally a consequence of the
fact that we chose an analytic expression for
fit), so that the relation between two values
of the function is completely determinate, no
matter how great the difference between their
arguments. In a more representative time
series, involving a certain amount of statisti-
cal uncertainty, we would expect £,(r) to ap-
proach zero as t increases, reflecting the in-
creasing importance of statistical dispersion as
the time interval becomes greater.
The significance of the autocorrelation func-
tion for data smoothing and prediction is ob-
vious without much study. Thus, suppose for
simplicity that the observational error #(0
is zero. Then the autocorrelation <f>, (t) is the
only one involved. It is a measure of the ex-
tent to which the true target path "hangs to-
gether" and is thus predictable. For example,
in weather forecasting it is a well-known prin-
ciple that in the absence of any other infor-
mation it is a reasonably good bet that tomor-
row's weather will be like today's but that the
reliability of such a prediction diminishes rap-
idly if we attempt to go beyond two or three
days. This would correspond to an autocorrela-
tion function which is fairly large in the neigh-
borhood of t = 0, but diminishes rapidly to zero
thereafter.
In a similar way the autocorrelation of the
observational error git) represents the extent
to which this error hangs together. In this
case, however, a high correlation is exactly
what we do not want. Thus, if <£2(t) vanishes
rapidly as r increases from zero, closely neigh-
boring values of g are quite uncorrelated, and
we need only average the input data over a
short interval in the immediate past in order
to have most of the observational errors aver-
aged out. If 4>2ir) is substantial for a much
longer range, on the other hand, a much longer
averaging period is necessary, with corre-
spondingly greater uncertainties in the value
obtained for fit).
«■ THE LEAST SQUARES ASSUMPTION
The autocorrelation function does not in it-
self suffice, to determine a time series com-
pletely. For example, it is easily seen that the
functions sin t + sin 2t and sin t + cos 2t have
the same autocorrelation in spite of the fact
that they represent waves of quite different
shape. The autocorrelation function, however,
has a peculiar importance in the fact that
under many circumstances it is the only piece
of information about the time series which we
need to know.
The significance of the autocorrelation be-
comes apparent as soon as we investigate the
error in prediction. In many mathematical sit-
uations involving linear systems it is conven-
ient to deal with the square of the error rather
than with the error itself, since a first varia-
tion in the error squared expression gives a
CONFIDENTIAL
^DATA SMOOTHING AS A _F_1LTER PROBLEM__
linear relationship in the quantities of direct
interest. We will deal with the square of the
error here. If E represents the instantaneous
error, /* (t + t,) - fit + t,) , the mean square
error over a long period of time is evidently
lim
L f*
= iim — r
\r(t + t,) -f(t + t,)}*dt
[f(t + tf)]*dt
- lim ^ f f(t + t,)f*(t + t/)dt
T -»» TJ_T
+ lim JL I'* ir(t + t,)\2dt. (6)
The first integral in equation (6) can be
evaluated immediately. From (2) it is <M0).
To evaluate the second integral replace f*(t
+ tf) by its definition from (1). This gives
-lim lfTf{t + t,)dt ["[fit - r)
+ g(t - T)]dK(r) = - lim ]- f dK{r)
(T lf(t + t/)f(t-r)+f{t + t/)g(t-r)}dt
J-T
if we reverse the order of integration. Since
we assume that / and g are uncorrelated, how-
ever, the product f (t + tf)g\t - r) in this ex-
pression makes no contribution to the final re-
sult, and by replacing the integral of f(t + t,)
f(t — t) by its value in terms of 4>l the expres-
sion as a whole can be written as
■
-if <t>i(tf +t) dK(T).
The third integral in (6) can be simplified in
similar fashion. The final result becomes
& - 4>i (P) - 2
f *i
Jo
(tf + r) dK(r)
(7)
+J\k{t) £ [0i(r - c) + Mr ~ <r))dK(c) .
The only quantities appearing in equation
(7) are the autocorrelations, <£, and 4>2, of the
true target path and the observational error,
and the function K which specifies the data-
unoothing structure. The theoretical problem
with which we are confronted is evidently that
of choosing K to make the mean square error
as small as possible for any given $'s. This
problem will not be attacked here, although a
solution obtained by a somewhat indirect
method is presented in the next chapter. The
principal reason for deriving equation (7) is
to demonstrate the very important fact that
the mean square error depends only upon the
two autocorrelations. No other characteristics
of the input data need be considered.
It will be recalled that the mean square cri-
terion was introduced originally on the ground
of mathematical convenience. This leaves un-
settled the question of how good a measure of
performance for a data-smoothi; g network it
actually is. This is a critical question, since
upon it depends the validity of the whole ap-
proach outlined in this chapter. A priori, the
least squares criterion is a dubious one since
it gives principal weight to large errors. In
fire control we are normally interested only in
shots which are close enough to register as hits.
If a shot misses it makes little difference
whether the miss is large or small. The merits
of the least squares criterion are considered
in more detail in Chapter 9, where the conclu-
sion is reached that the criterion is probably
adequate for many problems but needs to be
supplemented or replaced in others, including
the special case of heavy antiaircraft fire to
which particular attention is given in this
monograph. Pending the discussion in Chapter
9, the least squares criterion will be assumed
to be a valid one, with the understanding that
the analysis is intended primarily for its value
in contributing to the general understanding of
the data-smoothing problem rather than as a
means of fixing the exact proportions of an op-
timal smoothing network.
DATA SMOOTHING AS A FILTER
PROBLEM
The time-series approach to data smoothing
is closely associated with another which at first
sight may seem quite different. This second
approach is suggested by the procedures used
in communication engineering. Here the sig-
nals, be they voice, music, television, or what
not, are again time series. Instead of dealing
CONFIDENTIAL
80
with actual signals varying in a more or less
irregular and random manner with time, how-
ever, it is customary to deal with their equiva-
lent steady-state components on the frequency
spectrum.6
The analysis of data smoothing can conven-
iently be approached by supposing that both
the true path of the target and the effects of
tracking errors are represented, in a similar
way, by their frequency spectra. When the
situation is presented in this way, however,
there is an obvious analogy between the prob-
lem of smoothing the data to eliminate or re-
duce the effect of tracking errors and the prob-
lem of separating a signal from interfering
noise in communication systems. We may take
as an example of the latter the transmission
of voice or music by ordinary radio over fairly
long distances, so that the effects of static in-
terference are appreciable. In such a system
a reasonable separation of the desired signal
from the static can be obtained by means of
a filter. In a representative situation an ap-
propriate filter might transmit frequencies up
to perhaps 2,000 or 3,000 cycles per second,'
while rejecting higher frequencies.
The choice of any specific cutoff, such as
2,000 or 3,000 c, in the radio system depends
upon a compromise between conflicting consid-
erations. Both speech or music and static nor-
mally include components of all frequencies
which can be heard by the human ear. Thus,
suppressing any frequency range below the
limits of audibility, at perhaps 10,000 or 20,000
c, will injure the signal to some extent. The
intensity of the signal components, however,
diminishes rapidly above 2,000 or 3,000 c, while
the energy of the static interference is more
evenly distributed over the spectrum. Thus, by
filtering out the first 2,000 or 3,000 c, we can
retain most of the signal while rejecting most
of the noise. Naturally, the exact dividing line
will depend upon the relative levels of signal
and noise power. If the static interference is
quite weak, for example, it would be worth
b The review of communication theory given in Ap-
pendix A shows how this equivalence is established by
Fourier or Laplace transform methods.
0 In practice, of course, the filtering would probably
take place in the radio-frequency circuits, but it is
more convenient here to think of it occurring in the
demodulated output.
while to transmit a considerably wider band
in order to retain a more nearly perfect signal.
If the static level is extremely high, on the
other hand, it would be necessary to transmit a
still narrower band at the cost of greater mu-
tilation of the signal.
The separation of the true path of a target
from the observed path including tracking
errors, as a preliminary to prediction of the
future position of the target, presents an ap-
proximately analogous situation. Again the
spectrum of the "signal" or true path is con-
centrated principally in a low-frequency band,
in most instances, while the energy of tracking
errors or "noise" appears principally at con-
siderably higher frequencies. Thus the two can
be separated by a low-pass filter. The separa-
tion, however, is not complete since some com-
ponents of the signal spectrum extend into the
noise region. Thus the smoothing process must
be accompanied by some mutilation of the sig-
nal, and the optimum compromise is again
attained from a filter which transmits a rela-
tively broad band when the tracking errors are
of low intensity and a much narrower band
when they are large.
In these terms the most obvious difference
between the data-smoothing problem and the
static interference problem in the radio system
is in the order of magnitude of the frequencies
involved. They are roughly 10,000 times smaller
in the data-smoothing case. Thus, the typical
signal band in a fire-control system may cover
a few tenths of a cycle per second, in compari-
son with a useful band of 2,000 or 3,000 c in a
radio system, and the spectrum of tracking
errors or noise, with representative tracking
devices, includes appreciable components up to
perhaps 2 or 3 c, in comparison with a total
effective noise band in the radio system ex-
tending to the limits of audibility at perhaps
20,000 c.
This analogy between data smoothing and
the filtering problems which appear in ordi-
nary communication systems transmitting
speech or music must of course not be carried
too far. For example, previous experience with
communication filters is of no help in fixing in
detail the cutoff in attenuation characteristic
of the data-smoothing filter, since in communi-
cation systems these choices depend on psycho-
CONFIDENTIAL
PHYSICAL AND TACTICAL CONSIDERATIONS
81
logical considerations of no relevance in the fire-
control problem. Methods of determining the
best rules for proportioning a data-smoothing
filter, therefore, remain to be determined. We
may also notice that, whereas the time-series
approach was of the data-smoothing and pre-
diction type, the filter approach emphasizes
data smoothing only. The addition of the pre-
diction function can be expected to change ma-
terially the overall characteristics of the cir-
cuit. Neither of these remarks, however, robs
the filter approach of its value as a simple way
of thinking about the problem qualitatively.
RELATION BETWEEN TIME-SERIES
AND FILTER APPROACHES
7.7
The time-series and filter methods of looking
at data smoothing are related to one another
by the fact that the autocorrelation can be com-
puted from the amplitude spectrum, or vice
versa, by Fourier transform means. Consider,
for example, the Fourier transform of the
autocorrelation. If we make use in particular
of (4) we have
0..v (r)e ~*
V2irJ_a
i- f
""dr
jC
f(t + r)f(l)dt
1
V2t wt X
V2
/.CO
f{t)dt / f(l +t) e-^-dr
•J — CD
/(/ + T)e-*"»+*J rfr
(8)
where
1 fm
*'(«) = me-»*dt
y/2
L. f
'2r X.
f(t + t) e- •«('+') dr
(9)
F(w) is of course the steady-state spectrum
of the signal f(t). Equation (8) thus states
that the Fourier transform of <f>.s- is equal to a
constant times the square of the amplitude of
the steady-state spectrum. The amplitude
squared spectrum is, however, a measure of
the power per cycle. The relation is therefore
equivalent to the statement that the autocorre-
lation and power spectrum are Fourier trans-
forms of each other.
Since we have already established the fact
that the mean square error in prediction de-
pends only on the autocorrelation, this analysis
enables us to conclude immediately that the
mean square error can also be calculated from
the power spectra of the signal and noise. It
is entirely independent of the phase relations
in either signal or noise. The phase character-
istics of the data-smoothing network, which
operates on the signal after a specific wave
shape has been established, is, of course, still
of consequence.
PHYSICAL AND TACTICAL
CONSIDERATIONS
Thus far the material which has been pre-
sented has been primarily mathematical. It
has consisted, in other words, of outlines of
general analytical methods which are available
for use with the data-smoothing problem. It is
also possible to approach the problem in a
much more concrete fashion. It is obvious that
by giving thought to the details of the physical
characteristics of tracking units and targets,
and to the tactical situations with which we
expect to deal, it should be possible to draw a
number of specific conclusions about the prob-
lem as a whole. In a general theory of the de-
sign and tactical use of fire-control apparatus
such an approach might well be a primary one.
It is scarcely possible to follow it in detail in
the present discussion. The following para-
graphs, however, indicate some of the kinds of
considerations which can be brought into the
problem in this way. It will be seen that they
tend to modify the strictly mathematical ap-
proach, partly by qualifying to some extent the
assumptions made in the mathematics, and
partly by tending to give much more emphasis
to particular aspects of the problem than would
appear in a general analytic outline.
Choice of ouukuiinatbb
One of the most obvious omissions in the
general analysis thus far is any consideration
of the choice of coordinates in which the data
CONFIDENTIAL
82
FORMULATION OF THE DATA-SMOOTHING PROBLEM
smoothing is to take place. So far as either
the statistical or filter theory is concerned, the
coordinates in the data smoother may repre-
sent either the original tracking data or any
transformation of them. The fact that there is
actually something to be decided here, however,
is easily seen from the long-range antiaircraft
problem. The input tracking coordinates for
antiaircraft would normally be azimuth, eleva-
tion, and slant range. If the airplane flies in a
straight line roughly overhead, the general
shape of the azimuth and the azimuth rate as
functions of time are given by the curves in
Figure 2. The curves become indefinitely
3200
2400
1600
800
ACMILS)
A(MIL5/SEC)
V
tSECS
600
400
200
10
Figure 2. Azimuth and azimuth rate for crossing
target.
steeper as the target path approaches the
zenith, and it will be seen that if the approach
is reasonably close, either the azimuth or the
azimuth rate must include a very substantial
amount of high-frequency energy. Since the
possibility of an effective separation between
the signal and noise in the filter approach de-
pends upon the assumption that the signal com-
ponents are of quite low frequency with respect
to the noise, the presence of this high-frequency
energy is evidently serious.
When the target describes a violently evasive
path the signal spectrum must naturally in-
clude substantial high-frequency components,
whatever the coordinate system may be. The
high-frequency components indicated in Figure
2, however, are due to the fact that the target
path happens to pass almost over the director
and are essentially superimposed upon the
high-frequency components which reflect the
complexity of the target path itself. It is clear
as a matter of principle that an acceptable
coordinate system for data smoothing should
not introduce frequency components which de-
pend upon such accidental factors as the loca-
tion and orientation of the coordinate system.
The rectangular system mentioned in connec-
tion with Figure 1 evidently meets this condi-
tion; so also does the "intrinsic" system de-
scribed in the next section.
Physical Limitations of Target or Tracker
We may also approach the data-smoothing
question by a consideration of the motions
which are physically possible either in the
target or in the tracking device. In the heavy
antiaircraft problem, for example, there are
substantial physical limitations on the per-
formance possibilities of present-day aircraft
We can be quite sure that any motion incom-
patible with these limitations is necessarily a
tracking error and can be removed from the
incoming data. Naturally, these limitations
must appear in the power spectrum of the sig-
nal if they affect the mean square error in pre-
diction, so that their existence in no way dis-
putes the mathematical framework we have
set up. Consideration of the physical factors
which produce them, however, may permit
them to be established more easily or in more
clear-cut fashion than would be possible from
a statistical examination of target records
alone.
The limitations on airplane performance
can be stated most simply when the motion of
the airplane is expressed in so-called intrinsic
coordinates. These are the speed of the air-
plane, its heading, and its angle of dive or
climb. The maneuvering possibilities of a con-
ventional airplane in these three directions are
quite unequal. By banking sharply it can
maneuver violently to the right and left and
thus make quick changes in heading. The pos-
sibilities of maneuvering up and down, how-
ever, are considerably less, particularly for a
heavy airplane, where there are usually restric-
tions on the maximum angle of dive or climb
which can be assumed. The possibilities of
quickly changing the speed of the airplane,
finally, are almost nil. The thrust of an air-
plane propeller is so small in comparison with
CONFIDENTIAL
83
the mass of the airplane that only small accel-
erations are possible.*1
Thus the optimum filters for the three coor-
dinates should be different. The one for speed
can have a very narrow band, since most of
the signal energy for this coordinate occurs at
very low frequencies. The optimum band for
the angle of dive or climb, however, should be
larger (unless it turns out that pilots seldom
make use of maneuvering possibilities in this
direction) and the one for the heading larger
still. In this ability to discriminate among the
various possible directions of motion the in-
trinsic coordinate system is evidently an im-
provement even on the rectangular system.
Settling Time
Another aspect of the data-smoothing prob-
lem which has not been given conspicuous at-
tention in the purely mathematical discussion
is the fact that in an actual tactical situation
questions of elapsed time are of great impor-
tance^ Engagements usually begin suddenly
and last for a comparatively brief period, and
it is important to find a data-smoothing scheme
which provides adequate firing data as quickly
as possible after an engagement starts. A situ-
ation essentially similar to the beginning of an
engagement may also be presented whenever
the target makes a sudden change of course or
whenever it is necessary to shift from one
target to another in a given attacking body.
The time required for a computer to give
usable output data after any of these events is
its so-called "settling time," and is one of the
most important parameters of any data-
smoothing system. It is possible to make rough
estimates of settling time by indirect means in
both the statistical and filter theories of data
smoothing, but no explicit consideration of
necessary time lapses appears in either theory.
Evidently, the fundamental fault lies with the
"stationary" assumption.
* This ignores the possibility of changing the speed
through gravitational forces. Since these possibilities
are linked to the angle of dive or climb, however, they
can be predicted. This has actually been done in one
experimental computer.
Effect of Human Factors
Aside from the conditions on target perform-
ance which arise from the physical character-
istics of the target itself, there are others
which are due to the fact that the target is
under the control of a human being with a
definite purpose. The language of the statistical
and filter methods is broad enough to cover
almost any situation. It tends to suggest, how-
ever, that the typical target paths with which
we deal are the relatively structureless conse-
quences of random physical forces. The inter-
vention of purposive human behavior, on the
other hand, tends to give paths which fall into
more or less definite patterns. A simple illus-
tration is furnished by the argument which is
frequently offered in defense of the straight
line assumption in dealing with antiaircraft
defense against heavy bombers. It is contended
that while the targets may in fact engage in
substantial evasive maneuvers during most of
their flight, there will always be a substantial
period during the bombing run in which they
must fly very straight in order to achieve
bombing accuracy. On the basis of ordinary
probability we would of course expect substan-
tial straight line segments quite infrequently
if the course as a whole shows marked disper-
sion, and the intervention of the human pilot
thus provides a higher degree of structure than
one would expect in a corresponding situation
dominated by purely natural factors.
A broader example is furnished by a com-
parison of two airplanes, or perhaps more
simply of two boats, one of which is under the
control of a human operator, while in the other
the steering controls are lashed in a neutral
position. Both boats, say, may be expected to
experience small variations of course due to the
random effects of wind and waves upon them.
Over a short period of time the observed mo-
tions of the two boats should be substantially
identical. In the case of the boat with the
lashed helm these random variations will tend
to accumulate, so that it is possible to make a
reasonable prediction of the position of the
boat for only a comparatively short distance
in the future. In the boat with the human
steersman, on the other hand, we may expect
corrections to be applied as soon as the random
effects become large, so that the boat tends to
CONFIDENTIAL
84
FORMULATION OF THE DATA-SMOOTHING PROBLEM
retain the same general course and it is pos-
sible to predict its position hours or even days
later from a relatively brief observation.
Neither of these illustrations is inconsistent
with the mathematical framework laid down
phase relations, even if the special features in
these situations may be the controlling factors
in determining the actual probability of hit-
ting. If we could believe the bombing run
hypothesis, for example, and had a sufficiently
earlier in the chapter, in a purely theoretical accurate computer and gun, we could expect
sense. For example, the bombing run illustra-
tion merely states that because of the presence
of the human operator there are definite phase
relations in the input signal. As we have seen,
such relations can exist without affecting com-
putations based on mean square error. The
to score a hit in every engagement, no matter
how large the mean square error might be.
More generally, it is probably only the ten-
dency of targets to exhibit "line spectra" which
prevents the real probability of a kill, small
at best, from becoming microscopic. It is nec-
comparison between the piloted and pilotless essary to lay special emphasis on these factors
boats can be interpreted as the result primarily
of differences in the signal power spectrum.
In the case of the pilotless boat, for example,
the signal occupies a fairly continuous low-
frequency band, while in the case of the piloted
boat it must be regarded as concentrated very
closely around zero frequency, so that it is ap-
proximately a line spectrum superimposed on
a continuous one. The formal mathematical
theory covers also such cases as these.
The point of this discussion, however, is that
the mathematical theory, although it is suf-
ficiently general in a formal sense, fails to dif-
ferentiate between such situations as those
just described and the more shapeless sort which the mean square error is not a good
involving continuous spectra with random guide to the actual probability of scoring a hit.
in order to keep the overall fire control picture
in perspective.
CRITERION OF PERFORMANCE
Last on this list of doubts about the statisti-
cal and filter theories, we may mention the
least squares criterion of accuracy. This was
discussed before, but it is mentioned again as
a matter of emphasis, and because of its close
relation with the factors we have just dis-
cussed. For example, the bombing run illustra-
tion obviously represents one situation in
CONFIDENTIAL
Chapter 8
STEADY-STATE ANALYS
Tt was shown in the previous chapter that
J- both the statistical and filter theory ways of
looking at the data-smoothing problem lead
naturally to an analysis in terms of the power
spectra of the signal and noise. The phase rela-
tions are not important as long as we accept
the mean square error as a criterion of per-
formance. The inadequacies of the mean square
criterion will finally force us to abandon the
steady-state attack in favor of a direct analysis
in terms of the wave shapes of some assumed
signals. The steady-state attack is nevertheless
a very useful one. This chapter will conse-
quently continue the analysis from this point
of view. It will be assumed as heretofore that
the heavy antiaircraft problem is the particular
subject of interest.
A large part of the discussion hinges upon
the conditions which must be satisfied by the
external characteristics of an electrical net-
work if it is to be capable of physical realiza-
tion in any way whatever. These limitations
and the characteristics which may be postulated
for physical networks are decisive since, in the
absence of such restrictions, no limits could be
set upon the performance which might be ex-
pected from data-smoothing and predicting
circuits. The facts about physically realizable
networks which we shall find of most use are
summarized below, but the reader not familiar
with this field is urged to read also the account
given in Sections A.9 and A.10, Appendix A.»*
The conditions which must be satisfied by
physically realizable networks can be stated in
either transient or steady-state terms. In tran-
sient terms they are expressed most simply by
the statement that the response of a physical
network to an impulsive force must be zero up
to the time the force is applied. Thus the net-
work has no power to predict a purely arbi-
trary event. That is, it has no way of foresee-
ing whether or not an impulse is actually going
to be applied to it. This characteristic of physi-
cal networks is taken as a postulate.
The steady-state limitations on physical net-
S OF DATA SMOOTHING
works are expressed in terms of their attenua-
tion and phase characteristics. They may be
derived either from the transient specification
or from the postulate that a physical network
must be stable. There are no important limita-
tions to be placed upon the attenuation and
phase characteristics of physical networks as
long as we deal with these characteristics "sepa-
rately, but there are very severe limitations on
the phase characteristic which can be associated
with any given attenuation characteristic or
vice versa. In particular, when the attenuation
characteristic is prescribed, there is a definite
formula for calculating the unique limiting
phase characteristic with which it may be asso-
ciated.1" This is the so-called "minimum phase"
characteristic because any other physical net-
work having the postulated attenuation char-
acteristic must have as great or greater phase
shift at every frequency. As we shall see later,
this greater phase characteristic would corre-
spond to longer lags in obtaining usable data,
so that the minimum phase characteristic is
the optimum for a data-smoothing network.
The minimum phase characteristic has the addi-
tional important property that not only does
it specify the transfer admittance of a physical
network, but the reciprocal of that transfer
admittance can also be realized by a physical
structure.'
In addition to this principal formula for the
relation between attenuation and phase there
are a number of subsidiary expressions for
special aspects of the problem. One in partic-
ular, relating the attenuation to the behavior
of the phase characteristic in the neighborhood
of zero frequency, is used extensively in this
chapter.
» In limiting cases, such as may be found when the
transfer admittance contains zeros or poles exactly on
the real frequency axis, the "physical structure" may
require such constituents as ideally nondissipative re-
actances, perfect amplifiers with unlimited gain, etc.
This, however, is of no consequence for the present
general discussion.
CONFIDENTIAL
85
86
STEADY-STATE ANA!
DATA SMOOTHING
" 1 THE SIGNAL SPECTRUM
It is natural to begin with a discussion of the
spectrum of a typical target path. Unfortu-
nately no data on the spectra of actual meas-
ured airplane paths exist, and the theoretical
assumptions which may be made about paths
of airplane targets are best discussed in the
next chapter. This section consequently will be
confined to rather general observations about
the problem. It will be convenient to assume
for definiteness that the quantities to be
smoothed are the velocity components in Car-
tesian coordinates.
The simplest point of departure is furnished
by the conventional assumption that the target
flies in a straight line at constant speed. If we
could construe this assumption literally, it
would mean that the velocity spectrum in rec-
tangular coordinates would reduce to a single
line at zero frequency. In practice, of course,
the spectrum is not so simple. Even in the
absence of deliberate maneuvering, the target
will fly a slightly curved path because of
"wander." Moreover, even if the target could
fly exactly straight, the single line spectrum
would apply only to a straight course in-
definitely continued. The spectrum becomes
more complicated if we consider the fact that
tracking must have begun at some finite time
in the past, or that the target may presumably
change occasionally from one straight line
course to another.
As a result of both these causes, the actual
signal spectrum must be regarded as occupying
a band bordering on zero frequency. The distri-
bution of energy in detail will, of course,
depend on particular circumstances. The band
has no very well defined upper limit, but in
most cases the great bulk, at least, of the
energy should be below, say, one-fourth or one-
fifth of a cycle per second. For example, the
natural periods of a heavy airplane, which one
would expect to be correlated with wander, are
below this limit." This limit is also sufficient to
include most of the energy resulting from
changes in course occurring as frequently as
every ten or twenty seconds.
In general, it is to be supposed that the sig-
nal spectrum varies as where n may be
1, 2, 3, depending on the frequency range. This
follows from general considerations of the
limitations of airplane performance. Thus, if
we suppose that the velocity changes discon-
tinuous^ from time to time, it follows from
general Fourier principles that the amplitude
must vary as This is presumably a fair
representation of the actual signal spectrum at
low frequencies. At moderate frequencies, how-
ever, we must take account of the fact that the
velocity can actually be changed rapidly but
not discontinuously, and we consequently
assume that the amplitude begins to vary as
ura. Finally, at frequencies of the order of per-
haps one cycle per second one must take ac-
count of the fact that the airplane must bank
in order to turn. Since it takes some time to roll
into the bank, even the acceleration in the lat-
eral direction cannot be discontinuous, and
consequently the amplitude must begin to vary
as c.r\ The application of such successive limit-
ing factors in constructing a complete spec-
trum is described in more detail in Section A.8
of Appendix A.
One other general condition of the same kind
can be mentioned. It can be shown" that the
integral from zero to infinity of log H/l + if",
where H is the power spectrum, is very impor-
tant in determining the properties of a time
series. More explicitly, the integral converges
if the series is essentially statistical, so that we
cannot foretell the future from the past with
absolute certainty. This of course is the case
with an actual signal spectrum in a fire-control
problem. It implies two consequences; first,
that H cannot be zero over any finite band ; and
second, that in the neighborhood of infinite fre-
quency H diminishes slowly enough so that
| log H\/o>->0.
•« THE NOISE SPECTRUM
The spectrum of tracking errors depends
largely upon the particular sort of tracking
equipment involved. Broadly speaking, optical
tracking equipment (at least that of the present
or recent past) tends to produce tracking errors
not only of small amplitude, but also of low
frequency, so that they are hard to separate
from the signal spectrum. Radar equipment, of
the present time, produces higher-frequency
errors. Relatively high-frequency errors are
particularly likely to be found in very stiff
automatic tracking radars.
CONFIDENTIAL
RANDOM NOISE FUNCTIONS _
87
A number of examples of spectra of tracking
errors are shown in Figures 1, 2, and 3. The
spectra are given directly in terms of range
and angle errors. To make them comparable
with the velocity spectra described previously
POWER SPECTRUM
RANGE ERRORS
RMS =30 YDS
MEDIAN = 0.022CPS
6.10*-
5.10*
a.
E 4.10*-
t 4 6 « 10
FREQUENCY IN UNITS OF
Figure 1.
, 12 14 IS
90
Power spectrum of range errors of ex-
r.
it would be necessary to multiply all amplitudes
by io. In addition, it would of course also be
necessary to multiply the angle rates by some
suitable range in order to compare them di-
rectly with the yards-per-second rates we have
otherwise considered.
After multiplication by <■>, the radar spectra
appear to be about flat up to perhaps one cycle.
Beyond that point they no doubt drop off
slowly, although the accuracy of the data is not
sufficient to permit the situation to be stated
very exactly.
8.3
RANDOM NOISE FUNCTIONS
The properties of the signal and noise as we
assume them here can be conveniently
expressed by reference to the theory of so-called
"random noise" functions.h A random noise can
be defined as a function which has a definite
amplitude spectrum but completely random
phase characteristics. The theory of such func-
tions is well developed because of their frequent
POWER SPECTRUM
ANGULAR HEIGHT ERRORS
RMS= 1.0 MIL
MEDIAN =0.53 CPS
t 10
A 6 8 10 12
FREQUENCY IN UNITS OF^CPS
Figure 2. Power spectrum
errors of experimental radar.
of angular height
occurrence in physics. It is probable that
neither our noise functions nor our signal func-
tions are, strictly speaking, random noise ac-
cording to this definition. Thus, there are proba-
bly certain definite phase relations in our noise
functions because of the physical character-
istics of tracking devices. There is no evidence,
however, that any such relations are important
enough to be significant in the data-smoothing
problem, so that we are fully justified in iden-
tifying them with random noise functions as
defined above. The phase relations in the signal
are by no means random. As long as we con-
sider only the mean square error, however, this
factor is immaterial, and we can replace the
actual signal by a random noise function with
the same power spectrum for purposes of
analysis.
The most familiar example of a random
noise function is furnished by the thermal
"The fact that we also refer to tracking errors as
"noise" is, of course, merely a coincidence.
CONFIDENTIAL
88
voltage across a resistance R. This is a random
noise whose spectrum is constant up to very
high frequencies with the value P == 4\kTR (k
is Boltzmann's constant and T the absolute
temperature) . A second example is black body
POWER SPECTRUM
TRAVERSE ERRORS
RMS = 1.4 MIL
MEDIAN =0.31 CPS
CO 10
i
EL
U
cr
UJ
1
CO
— J
2
■» -
•OWER II
/
/ ^
0 2 4 6 1
1 10 12 14 16
FREQUENCY IN UNITS OF jtCPS
45
Power spectrum of trav
radiation. If there is black body radiation in a
space, the electric (or magnetic) field intensity
at a point is a random noise function with
spectrum
P(D =
8*/3 1
according to Planck's law. Random noise func-
tions also occur in the Schottky effect, in
Brownian motion, and in diffusion and heat
flow problems.
For purposes of analysis, a random noise
function can be thought of as a function made
up of a large number of sinusoidal components,
which are very closely spaced in frequency
and whose phases are completely ran-
dom.21 231 Thus a random noise can be repre-
sented as
.V
2] a- cos {unt + <(>n)
n - 1
where wn — n&f, A/ being the frequency differ-
ence between adjacent components. The phase
angles <f>„ are random variables which are in-
dependent with a uniform probability distribu-
tion from 0 to 2tt. As A/ decreases the functions
in this ensemble approach, in a certain sense,
a limiting ensemble, providing the amplitudes
a„ are adjusted properly. What is desired is to
have the total power in the neighborhood of
each frequency approach a certain limit P(/),
the power spectrum at that frequency. To do
this we make
a.i = 2tP(/)A/.
In the limiting ensemble the total power within
a small frequency range A/ is then P(/)A/.
The function PU) completely describes the
random noise ensemble from the statistical
point of view.
A particularly important special case is that
of a random noise with a constant power spec-
trum. This is often called "flat" or "white"
noise. True constancy out to infinite frequencies
is of course impossible since it would imply an
infinite total power in the function. The idea
is, however, still useful and can be approxi-
mated, as with resistance noise, by having a
spectrum which is constant out to such high
frequencies that behavior beyond this point is
of no importance to the problem. We may con-
veniently think of flat random noise as being
made up of a succession of weak impulses oc-
curring frequently but at random times with
respect to one another. This results from the
fact that a Fourier analysis of a single impulse
gives a flat spectrum, and the random occur-
rence of many of them produces a random set
of phases. In a physical problem, such as resis-
tance noise or Brownian motion, these im-
pulses might correspond to the effects of indi-
vidual small particles. Such a situation is of
course completely chaotic. If the impulses are
large and occur relatively infrequently, the
power spectrum is still flat, though the func-
tion is no longer a random noise function as
defined here. This conception, which corre-
sponds to a physical situation including definite
causative elements, will be revived later under
the name of the elementary pulse method of
analysis.
Random noise functions have a number of
interesting characteristics. For example, they
have the "ergodic property." This means that
CONFIDENTIAL
89
averaging a statistic along the length of a par-
ticular random function give' the same results
as averaging the same statistic over an
ensemble of functions having the t ime power
spectrum. Each function is typical of the
ensemble. To be more precise one must admit
exceptions, but the probability of an exception
is zero. For example, if we determine the frac-
tion of time a given random function f(t) has
a value greater than some constant .4, it will
be equal to the fraction of all functions in the
ensemble which are greater than A at t — 0
(with probability 1 ) .
A second characteristic of random noise
functions is the fact that they frequently lead
to Gaussian or normal law distributions. For
example, the aronlit'-Hes of a random noise
function are di^tri^ <:._d about zero in accord-
ance with the nvr^ttal error law. Likewise, the
amplitudes for two points spaced a given dis-
tance apart form a two-dimensional normal
error law distribution when we consider all
possible positions of the first point. It is ap-
parent that if the signal and noise are actually
random functions the mean square error is as
good a criterion of performance as any other,
since it completely fixes the distribution in a
normal law case.
A final property of random noise functions
is the fact that if a random noise is passed
through a filter the output is still a random
noise. If the power spectrum of the noise is
P(w) and the transfer characteristic of the
filter is Y(iw), the output spectrum is
P(a>)\Y(iw) \\ In particular, if we take the
derivative of a random noise with spectrum
P(w) we obtain one with spectrum w2P(w).
This last property of random noise functions
suggests a method of representing them which
we shall find useful in the future. The method
is represented by Figure 4. It consists of a
FLAT
SHAPING
NOISE
SOURCE
FILTER
Figure 4. Circuit representation of random
functions.
source of flat noise followed by a shaping filter
to give the desired power spectrum. We can
easily assign to the filter the characteristics of
a physically realizable structure by making use
of the relations between attenuation and phase
mentioned earlier in the chapter. It is merely
necessary to convert the desired power spec-
trum into a specification of the attenuation
characteristic of the filter and then use the
loss-phase formula to compute the correspond-
ing phase shift. It will be assumed that this
procedure has been followed when we make use
of this circuit at a later point.
The method of representing random func-
tions thown by Figure 4 illustrates graphically
the basis of the prediction schemes described
thus far. The flat noise is of course absolutely
unpredictable. The history of the function up
to any given instant gives no indication of its
value even a microsecond later. The filter, how-
ever, forces the output current to have a cer-
tain structure on which a prediction may be
based. For example, if the filter will pass only
very low frequencies it is clear that the output
can change very little in a microsecond.
84 THEORETICAL PROPORTIONS FOR
A DATA-SMOOTHING FILTER
The signal and noise spectra furnish the raw
material from which a suitable data-smoothing
filter can be deduced. We have still to deter-
mine, however, the exact rule for choosing the
cutoff and attenuation characteristic of the
filter from these spectra. It is clear that previ-
ous experience with signal-to-noise problems
in systems transmitting voice- or music is no
help, since the filter proportions here depend
upon psychological considerations of no rele-
vance to the fire-control problem. For example,
the interfering effect of a small amount of
noise is much greater than one might expect
from energy considerations, especially in in-
tervals of low message level, and it is con-
sequently worth while to maintain a relatively
high level of attenuation in the noise band.
Conversely, the breadth of the band required
for the message depends as much on the ability
of the ear to reconstruct a complete signal
from an incomplete one as it does upon the
actual signal power spectrum.
In the data-smoothing case a suitable crite-
rion, dependent upon more physical considera-
tions, can be obtained by minimizing the rms
error at the filter output. This criterion is
CONFIDENTIAL
90
STEADY-STATE ANALYSIS OF DATA SMOOTHING
easily developed from the power spectrum ap-
proach, and in a sense it is, of course, the only
possible one as long as we follow the methods
developed thus far.
A very general theory for the minimization
of the rms error of the filter output has been
developed by Wiener.1 Since the power spec-
trum approach is not the one we shall eventu-
ally follow, however, it is not necessary to give
this analysis in detail. The nature of the rela-
tionships can be seen from an elementary corn-
in Figure 5 let OA be a unit
square error is a minimum if
0'
Figure 5. Vector relation between input and out-
put of data-smoothing network.
vector representing the signal component at
some particular frequency. Let the amplitude
ratio between the input and output of the data-
smoothing filter be x, and let it be assumed that
the system is phase distortionless. This can
always be accomplished, at the cost of lag, by
phase equalization. Then the actual signal
output can be . represented by OB, where
OB/OA = x. Let the ratio of noise power to
signal power at this frequency be k2. Then the
output noise can be represented by the vector
BC, at some arbitrary phase angle 6, where
BC/OA = kx.
The error in the output of the data-smooth-
ing filter is evidently represented by the vector
AC. We have
(Acy = (CM)?i(i - x - kxcosey + (kxsmey]
= {OA)* l (1 - is) - 2*i(l - x) cos 6 + k'x') .
Since 6 is random the cross-product term in-
volving cos 6 disappears on the average. (More
generally, it disappears as long as the noise and
signal are uncorrelated, whether or not their
relative phases are entirely random.) This
leaves the mean square error as
Wan - (OA)l [1 _ 2Z + (1 + *»)*»] . (1)
x —
1
1 + A-» PN + Ps
where PB and Ps are, respectively, the signal
and noise power at this frequency. Upon sub-
stituting this result in equation (1) and "re-
membering that (OAV = PB, we find that the
minimum mean square error is
PsPs (2)
min
Ps + Pi
Equation (2) evidently represents the sought-
for rule for the filter transmission character-
istic. It is illustrated in Figure 6, where PN
©
ce
z
21.
to
2
to
o
1
1
1
1
1
1
w 1
I
1 ^
I
■
I
1
1
i — -
FREQUENCY
02
Figure 6. Optimum transmission characteristic
for data smoothing assuming signals with random
noise characteristics.
Figure 7. Si
in Figure 6.
spectra assumed
and Pt have been chosen respectively as the
flat curve and the 1/w* curve in Figure 7. In
comparison with the characteristics of typi-
cal filters in communication systems it is quite
CONFIDENTIAL
91
rounded with a relatively slowly falling ampli-
tude characteristic. More important than the
detailed rule for the transmission character-
istic, however, is the conclusion that the shape
of the characteristic is not very critical. There
is very little loss in replacing the actual curve
in Figure 6, by any other similar character-
istic. For example, we might validate the
assumption of zero phase distortion by making
use of the curve which automatically gives a
linear phase shift.150
A more extreme illustration is furnished by
the infinitely selective filter characteristic, with
perfect transmission in the range in which the
signal power is greater than the noise power,
and zero transmission elsewhere, indicated by
the broken lines in Figure 6.
It follows from equation (1) that in the
neighborhood of the cutoff point <o0 the mean
square error for this filter is twice that of the
optimum structure. In most frequency ranges,
however, the penalty is far less than this. Since
even a two-to-one change in the mean square
error would produce no tremendous improve-
ment in the effectiveness of fire, it is clear that
the result to which we are led by this method
of attack is by no means critical.
LAGS IN DATA-SMOOTHING FILTERS
The analysis just concluded has been directed
at the amplitude characteristics of a data-
smoothing filter. By virtue of the relations be-
tween the amplitude and phase characteristics
of physical networks mentioned earlier in the
chapter, however, the analysis permits us to
»l
p
■u
/
IN »•
1
u a
V
f
•-
<
3
■■
<
Figure 8. Some filter attenuation characteristics.
give at least a partial description also of the
phase characteristics of the filters. This is an
important consideration because it bears upon
the question of time delays in data-smoothing
systems which was mentioned in Chapter 7.
The general nature of the relationship in
simple cases is illustrated by Figures 8 and 9.
to
10
30
01
«l
M
U
■
9
*
y
/j
— —
e SHirr in
1
y
y
M
£ /
uA*<
1
Figure 9. Corresponding minimum phase char-
acteristics.
Figure 8 shows a series of rising attenuation
characteristics equivalent to rather unselective
falling amplitude characteristics of the general
type shown by the principal curve in Figure 6.
Figure 9 shows the corresponding phase char-
acteristics computed on a minimum phase shift
basis. In Figure 8 the central attenuation char-
acteristic B has been so chosen that the corre-
sponding phase characteristic in Figure 9 is
exactly a straight line at low frequencies,
where the transmitted amplitudes are appreci-
able. Curves A and C in the two drawings show
slightly different cases, but it is clear from
the figures that the tendency of the phase
characteristics to approximate linearity is still
marked.
In communication engineering a phase char-
acteristic proportional to frequency is inter-
preted as indicating a delay in seconds equal to
the slope dB/dw of the phase characteristic.
This relation is illustrated most simply by an
ideal line. The ideal line has zero attenuation
combined with a phase shift which is propor-
tional to frequency and which at any given fre-
quency is also proportional to the length of the
line in question. If we apply any arbitrary
wave to the line it is propagated down the line
with a definite velocity and unchanged wave
form. The time required for the wave to reach
CONFIDENTIAL
92
any point on the line is equal to the slope of the
phase characteristic to that point.
In a structure like a filter, which has an at-
tenuation characteristic varying with fre-
quency, it is of course no longer possible to
transmit an arbitrarily impressed wave with-
out change in wave shape. Even if the applied
wave is merely a suddenly applied d-c voltage
or single frequency sinusoid, there is a tran-
sient period before the response approximates
its final value. In structures having a substan-
tially linear phase characteristic over any fre-
quency range in which they exhibit an appreci-
able amplitude response, however, this total
transient characteristic falls naturally into two
parts. The first is a waiting period equal to the
slope of the phase characteristic, during which
the response is very small, whereas the second
is a true transient period in which the response
is substantial but does not resemble the final
steady-state response. This is illustrated by
Figure 10 which shows the voltage at the fifth
L5
LO
05
t
10 15 20
<Jet
25
Figure 10. Voltage at fifth section of conventional
low-pass filter in response to unit d-c voltage.
section of a conventional low-pass filter in
response to a d-c voltage applied at zero time
at the input terminals.1" The end of the waiting
period, as deduced from the slope of the phase
characteristic, is indicated by the broken line.
Delays of the sort just illustrated must be
expected in a data-smoothing filter whenever
the nature of the signal is changed. This hap-
pens at the beginning of tracking, in changing
from one target to another, or even in follow-
ing a single target when the target makes an
abrupt change in course. Since usable data in
a fire-control system must be quite accurate,
the delay to be allowed for must include both
the initial waiting period and the subsequent
transient period until the transient ripples
have almost vanished. A considerable part of
the art of desi0 ung data-smoothing networks
consists in controlling the design so that these
final transient ripples decay relatively rapidly.
We are not yet ready to discuss this problem:
It will turn out, however, that the minimum
interval which can be assigned to the "true
transient" period is about equal to that which
must be allowed for the initial waiting period/
Thus the slope of th? phase characteristic can
be used as an index of the lags which must be
expected in data smoothing merely by doubling
the delay to which the slope would normally be
said to correspond.
When we use the phase slope as an index of
delay it becomes immediately apparent that
lags are the necessary consequence of smooth-
ing in physical circuits. This is easily seen by-
reference to the relations which must exist be-
tween attenuation and phase characteristics in
physical structures. An example is provided by
the formula15*1
(3)
where A is attenuation, .4,, is the attenuation
at zero frequency, and B is phase shift. In other
words, the delay (measured by the slope of the
phase characteristic at zero frequency) is pro-
portional to the integral of the attenuation on
an inverse frequency scale when the attenua-
tion at zero frequency is taken a&.the reference.
The equation thus states that the system will
exhibit a lagging response as long as there is a
net high-frequency attenuation. As a numerical
illustration, let it be supposed that A is zero
below 4» — 1. This corresponds to the estimate
made earlier in the chapter that the input sig-
nal components in antiaircraft work lie roughly
in the band below about 0.1 or 0.2 cycle per sec-
ond. Let it be supposed also that A at higher
frequencies is equal to 3 nepers, corresponding
to an average amplitude reduction of about 20
c This is not intended to imply that the distinction
between the initial waiting period and the "true tran-
sient" period is quite as sharp as it is in Figure 10. The
selectivity in a data-smoothing filter is usually not
great enough to justify the assumption that components
beyond the linear phase region are of negligible im-
portance.
CONFIDENTIAL
93
to 1. Then dB/d* at the origin is given from
equation (3) as S/n seconds, and in accordance
with the rule just enunciated the minimum de-
lay to be expected from such a structure in a
data-smoothing application would consequently
be 12/ir seconds.
Aside from such specific quantitative rela-
tions equation (3) is useful as a basis for a
number of important qualitative conclusions.
One, for example, is the fact that although a
lag is a necessary concomitant of any system
showing a high-frequency attenuation, the
amount of the lag depends greatly upon the
portion of the frequency spectrum in which
the attenuation is found. Since the integral is
taken on an inverse frequency scale, a small
attenuation at low frequencies is much more
important than a considerably greater attenua-
tion further out in the spectrum. This points to
the desirability of designing tracking instru-
ments which generate principally high-fre-
quency noise, even if the amplitude of the noise
is somewhat increased thereby. We may also
notice that since the attenuation is a logarith-
mic function of amplitude an initial moderate
reduction in the amplitude of disturbing noise
may be much less expensive in lag than subse-
quent attempts at further reduction. For ex-
ample, an amplitude reduction from 100 to 10
per cent over a given portion of the frequency
spectrum produces no more lag than a subse-
quent reduction from 10 to 1 per cent.
»« WIENER'S PREDICTION THEORY-
ZERO NOISE CASE
In Chapter 7 we distinguished between what
we called the simple data-smoothing problem
and the data-smoothing and prediction prob-
lem. The simple problem, with which this re-
port is chiefly concerned, is the one which has
been given principal attention thus far. On
account of its broad interest, however, it seems
worth while to include also a brief statement
of Wiener's solution of the general problem.
The method of development used here is intui-
tive and nonrigorous in comparison with
Wiener's own development, but it permits the
principal relations to be established by very
elementary means.
It is convenient to consider first the zero
noise case. The past history of the signal, then,
is known perfectly, and the existence of a
prediction problem depends entirely upon the
fact that since the signal is assumed to be sta-
tistical in character, its future is not com-
pletely determined from its past. The situation
can be thought of in the terms suggested by
Figure 11. The actual signal output appears at
FLAT
NOISE
SOURCE
SHAPING
NETWORK
N,
PREDICTING
NETWORK
N.
rl
NETWORK
Figure 11. Schematic representation of Wiener's
prediction theory when there is no noise.
P,. In accordance with the discussion earlier
in the chapter, we imagine this signal to be
generated by passing flat noise through the
shaping network Nx. The transfer admittance
Yx(iio) of Nt is determined from the power
spectrum of the signal by the procedure out-
lined earlier and is a minimum phase shift char-
acteristic. It will be recalled that minimum
phase shift transfer admittances have the im-
portant property that their reciprocals are also
the transfer admittances of physically realiz-
able networks.
From F, we can readily compute the tran-
sient response characteristic of N\. We shall
assume for illustrative purposes that the im-
pulsive admittance of Nl takes the special
shape shown by Figure 12.
Figure 12. Assumed impulsive admittance of
shaping filter.
The flat noise is thought of as consisting of
a large number of elementary impulses with
random amplitudes and occurring at random
times. For the purposes of this analysis, how-
ever, it is sufficient to consider only the three
unit impulses shown in Figure 13. Impulse B
is supposed to occur at the instant at which
94
STEADY-STATE ANALYSIS OF DATA SMOOTHING
the prediction is to be made, A occurs two sec-
onds in the past, and C, one second in the
future. The response of AT, to these three im-
pulses will evidently be three curves of the
sort given by Figure 12, suitably displaced in
time as shown by Figure 14.
B
1
-2 -I 0
Figure 13. Impulses giving rise to applied signal
through shaping filter.
The desired output of the predicting network
is the curve of Figure 14 advanced by the pre-
diction time, which we can assume, for illus-
tration, to be two seconds. It may be assumed
SUM \
I
t
#
1 ,
a • I
» "
$
$
9 1
"Hf \r
/\ '*
/ V *
\
\
%
\
t \
% \
* \
t
$
$
1
0
. *
I
*
/
t
<
V
-<
0 2 4 t
8
Figure 14. Applied signal at P„
for the sake of preliminary analysis that the
input of the predicting network is the three
original impulses of Figure 13. The terminal
Pt at which they are supi
appear is of
course a purely fictitious one and is not acces-
sible to us physically. We can, however, con-
struct the equivalent terminal P'3 by imposing
the actual signal from terminal Px on the net-
work N2, whose transfer admittance is the
reciprocal of that of
Let the predicting network connected to ter-
minal Fa be represented by N,. Obviously a
perfect prediction would be secured if Nt could
be assigned the impulsive admittance shown in
Figure 15, that is, an impulsive
/
/
2 (
» ;
> A
6 «
Figure 15. Iueal impulsive a
tion network N, in Figure 11.
equal to the impulsive admittance of the origi-
nal network but moved forward by the 2-second
prediction time. Then all the constituent curves
and the sum curve in Figure 14 would similarly
be moved forward. Of course we cannot assign
ATS an impulsive admittance which is different
from zero at negative times without postulat-
ing a nonphysical network. It is, however, per-
fectly possible to define N, from the portion of
the impulsive admittance characteristic at posi-
tive times, with the remainder set equal to
zero. This gives an impulsive admittance of
the type shown by Figure 16. When energized
by the three unitary impulses, it gives the
result shown in Figure 17. The contributions
of impulses A and B are not affected by the
absence of a negative time portion of the im-
pulsive admittance, but the contribution of im-
pulse C is lost.
To formulate a physical prediction network
2 0 <
\ A
Figure 16. Realizable portion of required im-
pulsive admittance.
CONFIDENTIAL
WIENER'S THEORY— CENERAL CASE
95
we have merely to find by conventional meth-
ods the steady-state admittance Y, corre-
sponding to the impulsive admittance of Figure
16. The two networks AT, and A7;1 may then be
in the manner shown by Figure 18. The first
source of flat noise, together with the shaping
network N,„ is the combination we have already
used to represent the signal in the noise-free
-2 0 2 4 6 8
Figure 17. Response of realizable prediction net-
work.
combined to give a single structure with the
transfer admittance Y,Y: = YJY, which will
give the complete prediction when energized by
the actual signal.
The mean square error in prediction is
easily determined from the fact that the con-
tributions of all impulses of the sort repre-
sented by C, occurring in the prediction in-
terval, are lost. Since impulses in the flat noise
source occur at random times the mean square
error is proportional
tojT
W-(T)dT, where a
is the prediction time and W is the impulsive
admittance of Figure 16. Since the flat noise
impulses occurring after the time at which the
prediction is made are surely unpredictable, it
is clear that this error is the least we could
expect any physical prediction network to have
WIENER'S THEORY-GENERAL CASE
When the input data includes noise as well as
the signal it is natural to think of the situation
FLAT
NOISE
SOURCE
SHAPING
NETWORK
N*
FLAT
NOISE
SOURCE
SHAPING
NETWORK
Figure 18. Circuit representation of random func-
tions representing signal and noise.
case. The addition of noise is represented by
the second independent source of flat noise with
its associated shaping network Nh. They com-
bine to give the total input measured at Pt.
This diagram emphasizes the fact that we
think of the noise and signal as originating
from different physical sources. By postulate,
however, we are not able to separate the
sources experimentally. So far as any observed
result is concerned, consequently, we may as
well deal with the simplified structure shown
in Figure 19 which contains a single source of
f LAT
SOUR"
SHAPING
NETWORK
IS
NETWORK
«4
— *
NETWORK
PREDICTING
NETWORK
"t
Figure 19. Schematic representation of Wiener's
prediction theory when there is noise.
flat noise and a single shaping network. The
transfer admittance of the shaping network N,
is determined by adding the power spectra of
signal and noise, converting the result to an
amplitude characteristic, and computing the
corresponding minimum phase according to
^methods already used for the noise-free
Although we cannot separate the signal from
d Note that the Bhaping network thu* obtained ia not
the same as the one we would secure by adding the
transfer admittances of N. and N, in Figure 18 di-
rectly. In order to realize the same total power at P,
in each case, it is necessary to begin by adding the
powers rather than the amplitude characteristics asso-
ciated with the two paths.
CONFIDENTIAL
96
STEADY-STATE ANALYSIS OF DATA SMOOTHING
the noise completely, we saw earlier that the
mean square difference between the total input
and the signal is minimized if we multiply the
amplitude of the input at each frequency by
the ratio of the signal power to the sum of the
signal and noise powers. A fictitious filter
having the prescribed amplitude characteristic
is represented by Nt in Figure 19. We assigned
2V4 a zero phase characteristic so that there
may be no lag in producing the result at P,.
Thus the output at Ps at any instant represents
the best conceivable estimate (in the least
squares sense) of the signal at that instant.
The assumption of zero phase, of course, makes
Ni nonphysical, since it must have at least the
minimum phase characteristic associated with
its prescribed amplitude characteristic. This,
however, is not an objection here since the
structure is introduced purely for purposes of
analysis.
The situation is now reduced to a form in
which it is substantially equivalent to the one
appearing in the zero-noise case. Wi assume a
series of random impulses at P., which would
produce responses at P,. The problem is that
of advancing the response to each impulse so
that the same result appears u seconds earlier
at terminal P4. The solution is represented by
networks 2V, and N3, which discharge functions
similar to those of the correspondingly labeled
networks in Figure 11. Thus, the network N2
is the reciprocal of N, and is provided to make
terminal P'2 equivalent to P„ as a source of im-
pulses. Network N3 is defined by an impulsive
admittance obtained from the impulsive admit-
tance between P, and P, by advancing the
latter characteristic a units in time and then
discarding the portion at negative time.
In this procedure there is only one point at
which the situation differs from that without
noise. In the noise-free case, the original im-
pulsive admittance which we wished to advance
in time was identically zero at negative times.
In order to secure a physically realizable re-
sult, we needed only to discard the portion of the
impulsive admittance between t = 0 and ( = a.
In the present situation, on the other hand, the
impulsive admittance is taken from a path in-
cluding the nonphysical network Nt. Thus the
admittance may be expected to take such form
as that shown in Figure 20, with nonzero am-
plitudes at both negative and positive times,
and in order to secure a physical final network
it is necessary to discard everything to the left
of the line a.
Figure 20. Typical impulsive admittance of best
smoothing network Ni in Figure 19.
This difference in the impulsive admittance
characteristics has two consequences. The first
is the fact that since the uncertainty of the
prediction is measured by the amount of im-
pulsive admittance which must be discarded,
it is evidently greater in the present case where
we are discarding much more. The second is
the fact that in the noise-free case uncertainty
exists only for a positive prediction time. A
negative prediction time, which corresponds, of
course, to the determination of the value as-
sumed by the signal at some time in the past,
can be set into the analysis as easily as a posi-
tive prediction time, merely by shifting the im-
pulsive admittance to the right rather than the
left. In the noise-free case, however, there is
nothing to be discarded when we shift to the
right, since the impulsive admittance with
which we begin is in any case identically zero
for negative times. Thus the uncertainty in
the determination of any past value of the sig-
nal is zero. Since we have postulated no noise
to confuse the data, this is, of course, an
inevitable result. As soon as noise is included,
on the other hand, there is no such sharp dis-
tinction between the future and the past.e The
uncertainty in the determination of the true
value of the signal in the near past is almost
as great as it is in estimating what the signal
will be in the near future. As we go further
* This statement is to be understood in a physical
rather than a mathematical sense. It is not intended
to imply that there may not be sharp changes of be-
havior in the impulsive admittance at zero.
CONFIDENTIAL
OVERALL CHARACTERISTICS OF PREDICTING NETWORKS
97
and further into the past the uncertainty
gradually diminishes. If we can allow ourselves
unlimited lag, we at length reach a point at
which the discarded portion of the impulsive
admittance characteristic is negligibly small.
This, however, does not mean that all uncer-
tainties have disappeared, but merely that we
can base our estimate of the signal upon the
power-ratio rule developed previously.
88 OVERALL CHARACTERISTICS OF
PREDICTING NETWORKS
It has been fairly easy to develop a qualita
tive picture of the general characteristics of
typical data-smoothing networks. As we have
seen, they have amplitude characteristics of the
low-pass filter type combined with lagging
phase shifts. No corresponding qualitative pic-
ture of the characteristics of a typical overall
predicting circuit has, however, been developed
as yet. The discussion just concluded provides
a rule for determining the characteristics of a
predicting circuit in any given case, but pro-
vides comparatively little in the nature of a
description of the result we may expect to
secure.
In any particular situation we can, of course,
calculate the overall characteristics of the pre-
dicting circuit. A simpler way of character-
izing the overall predictor characteristic quali-
tatively, however, is based upon the use of the
attenuation-phase relations for physical net-
works. We need merely use such an equation
as (3) backward. Thus, we have previously
shown that a positive phase slope corresponds
to a lagging output. Correspondingly, a nega-
tive phase slope can be interpreted to repre-
sent a lead, or in other words, a prediction.'
If we assign (dB/di>)u = 0 in equation (3) a
negative value, we see that A-A0 must on the
average be negative. In other words, the am-
plitude characteristic of an overall prediction
circuit must rise, on the average, as we proceed
upward from zero frequency. This is in marked
contrast to a data-smoothing network, which,
as we have seen, tends to have a low-pass filter
type of characteristic with a falling amplitude
characteristic at high frequencies. The in-
creased amplitude of response may have two
detrimental effects. In the first place, it evi-
dently produces a- distorting effect on any sig-
nal components to which it applies. In the
second place, it produces an exaggerated re-
sponse to noise.
Examples of the characteristics of overall
prediction circuits are readily constructed by
reference to the circuit of Figure 21. Various
Figure 21. One-dimensional prediction circuit
with data-smoothing networks.
' This, of course, does not mean that a network with
a negative phase slope can predict a perfectly arbitrary
event. We can hope to realize a negative phase slope,
in combination with a flat amplitude characteristic,
over only a finite band. The spectrum of an arbitrary
event, that is, any suddenly applied signal, will always
include important components running out to infinite
frequency, where the negative phase slope can no longer
be realized. The statement does, however, mean that if
we suddenly apply a signal made up of one or more
low-frequency sinusoids, and wait for the steady state
to become established, the output will appear to lead
the input by a time equal to the slope of the negative
phase characteristic.
particular results are obtained by assigning
particular characteristics to the data-smooth-
ing network. Thus, if the data-smoothing net-
work is absent entirely the transmission
through the path containing the differentiator
is u,tlt since differentiation is equivalent to
multiplication by i*>. The attenuation of the
overall circuit is consequently A = — log
|1 + imtf\. This is plotted as curve I of Figure
22. The increasing amplitude characteristic at
high frequencies is obviously due fundamen-
tally to the increased transmission through the
differentiator circuit.
If the data-smoothing network is assigned
the characteristic (1 + to**)-1, corresponding to
a very simple low-pass filter type of response,
the overall transmission becomes that shown
by curve II in Figure 22. (It is assumed that
a = t,, for simplicity.) The negative attenuation
at high frequencies is much reduced. This is
paid for by an increased amplitude of response
at low frequencies, but since the integration in
(3) takes place on an inverse frequency scale,
the low-frequency fragment is much less than
the gain reduction at high frequencies. Curve
CONFIDENTIAL
98
STEADY-STATE ANALYSIS OF DATA SMOOTHING
Ill shows the result whan the data-smoothing
network is assigned the characteristic
(1 + um) *. Finally, curve IV shows the result
obtainable when there is also a After in the
1
4
1
*
t
s
LOSS
-4
-»
Figure 22. Attenuation characteristics of predic-
tion circuit shown in Figure 21.
present-position circuit (as shown by the
broken lines in Figure 21), so that there may
be a net positive attenuation at high fre-
quencies.
In view of the inverse frequency scale in (3),
the gross negative attenuation will be mini-
mized if the negative attenuation region is
placed very close to zero frequency. This, how-
ever, means that much of the signal energy
falls in the negative attenuation region so that
in certain respects, at least, the signal response
must be seriously injured. For example, in the
specific circuits just discussed we can place the
negative attenuation region at very low fre-
quencies by choosing very long time constants,
a, in the data-smoothing networks, with the
consequence that the circuits will operate cor-
rectly for any long continued straight line path,
but will be very sluggish in changing from one
straight line to another. If the negative attenu-
ation region is placed at higher frequencies, on
the other hand, the signal response is improved
but beyond certain limits the circuit becomes
unbearably sensitive to noise.
Quantitative illustrations of these relation-
ships are quickly constructed. Suppose, for ex-
ample, that the prediction time is 2 seconds.
From (3) this is consistent with an attenua-
tion characteristic having zero attenuation
below - = 1 and a net gain of *■ nepers there-
after. In other words, the amplitudes of all
frequencies below « = 1 are increased by a fac-
tor of about 22 to 1. If the region of added
gain is pushed to a higher frequency or con-
centrated within a narrow band, the multi-
plying factor rapidly becomes larger. For ex-
ample, if we maintain A at approximately zero
below m = 2, the average gain above this point
must be 2» nepers, corresponding to a multi-
plying factor of 600 to 1. We secure the same
factor by attempting to concentrate the region
of negative attenuation in the band between
m = 1 and m = 2. The multiplying factor also
goes up rapidly as we increase the prediction
time. For example, with the gain uniformly
spread over the frequency region above «> = 1
the multiplying factor is 500 for a prediction
time of 4 seconds, or more than 10,000 for a
prediction time of 6 seconds.
Reasonable multiplying factors with long
prediction times can be obtained only by carry-
ing the negative attenuation region to very low
frequencies. As indicated previously, the cost
of this is an increase in the time required for
the signal to change from one constant or
nearly constant value to another. For exam-
ple, in the first illustration above, if the region
of nepers net gain is carried down from
o> = 1 to n = 0.2 the integral in (3) is just five
times as great as it was before, so that the
characteristic corresponds to a prediction time
of 10 rather than 2 seconds. This change
would correspond to an increase* from perhaps
4 or 5 to perhaps 20 or 25 seconds in the time
required for the circuit to settle from one con-
stant value to another.
Practical examples of the transmission char-
acteristics of overall prediction circuits, with
particular emphasis on the dominant effect of
even very small negative attenuations at ex-
tremely low frequencies, are shown later in
Figures 5 to 8, inclusive. In the linear predic-
tor, A - A„ varies as — ku>2 nears zero, and it is
easily seen that such a term makes a finite con-
« Only rough numbers can be given, since circuits
with the square-cornered attenuation characteristics
chosen for illustrative purposes would have very ripply
transient characteristics, corresponding to no very well
marked settling time.
CONFIDENTIAL
OVERALL CHARACTERISTICS OF PREDICTING NETWORKS
99
tribution to the integral in (3) . On the other
hand, the attenuation of the quadratic predic-
tor, which is capable of dealing exactly with
polynomial functions of time of the second
degree or less, is necessarily zero at the origin"
.
v2*£f JS£ of Quasi-Distortionleas Prediction
Networks in Appendix A.
to terms of the order of «4, so that the integral
in this region can be neglected. This slight
difference between the two characteristics at
frequencies of the order of 0.01 cycle per
second and below is sufficient to balance the
obviously greater negative attenuation of the
quadratic predictor at higher frequencies.
CONFIDENTIAL
Chapter 9
THE ASSUMPTION OF ANALYTIC ARCS
THE discussion in the previous two chap-
ters has been based upon the assumption
that the least squares criterion forms a suita-
ble measure of performance for a predicting
network. This assumption permitted us to re-
strict our attention to the amplitude spectra
of the signal and .noise, leaving phase relations
entirely out of account. Thus, both signal and
noise could be thought of as "random noise"
functions characterized by random phases and
Gaussian distributions, as described in the
preceding chapter. So far as the noise is con-
cerned, there seems to be nothing wrong with
this assumption. In the case of the signal, how-
ever, it appears that significant phase relations
may exist. This chapter will consequently set
up an alternative analysis which permits the
significance of possible phase relations in the
target paths to be estimated.
The alternative analysis is based upon the
assumption that the target courses are sequen-
ces of analytic segments of different lengths
joined together. These segments are simple
predictable curves such as straight lines, pa-
rabolas, and circles. Significant phase relations
are implied by the assumption that there are
sudden changes from one type of course to
another.
This picture of target paths is, of course,
extreme. There are no such sharp discontinui-
ties between one segment and another, nor do
airplanes fly perfectly along simple curves
even for limited periods. Nevertheless, it is
the conception of target courses upon which
the rest of our analysis is based. The reasons
for believing that it is a closer approximation
to actual target courses than, say, a random
noise function with the same power spectrum
would be, are given later. Perhaps more im-
portant is the fact that the possibility of hit-
ting an airplane flying along such a simple
analytic arc is much greater than it would be
if we were attempting to predict a correspond-
ing random noise function. It is thus advan-
tageous to take the analytic arc assumption as
a basis for designing the prediction circuit,
even if the assumption seems to be reasonably
well justified over only occasional segments of
actual target paths. An example of such a
situation is furnished by the bombing run
illustration described in Chapter 7.
As a corallary to the analytic arc assump-
tion it is also assumed that the theoretical
predicted point must be quite close to the actual
target position if the probability of scoring a
hit is to be appreciable. In other words, such
dispersive factors as random errors in com-
puter or gun or the lethal radius of the shell,
which would tend to produce occasional hits at
long distances from the theoretical predicted
point, are quite small. This is such a plausible
assumption in the light of present-day antiair-
craft experience that its critical importance in
the present argument is likely to go unper-
ceived. However, this is the assumption which
limits consideration to small errors in predic-
tion, whereas the least squares criterion natu-
rally gives greatest emphasis to large errors.
If, for example, antiaircraft projectiles were
suddenly endowed with a much greater de-
structive radius, we would be much more in-
terested in fairly large misses, and the objec-
tions to the least squares criterion would disap-
pear.
These postulates are discussed in more detail
in the following sections. In anticipation of
this discussion the following conclusions may
be mentioned:
1. With the assumptions as stated, the pre-
diction should be on a modal rather than a
least squares basis. In other words, the gun
should be aimed at the most probable future
position of the target.
2. Modal prediction requires evaluation of
the parameters of the analytic arc the target
is at present traversing. This can be accom-
plished by smoothing the values of these pa-
rameters evaluated for a period in the past.
3. If the smoothing is performed by linear
invariable networks, the impulsive admittances
of these networks should have a definite cutoff
after a finite smoothing time. By this means
100
CONFIDENTIAL
101
all data over a certain age are given zero weight.
The method of calculating the proper smooth-
ing time is developed.
4. Definite advantages can be obtained from
circuits with variable smoothing times if such
systems can be satisfactorily mechanized.
THE TARGET COURSES
The target courses, like the tracking errors,
can be thought of as a statistically generated
set of functions — that is, a stochastic process.
The structure of this process is, however, very
different from that of the tracking errors. It
is by no. means satisfactory to assume the
target courses to be equivalent to a random
noise having the same power spectrum as the
target courses. As we pointed out in Chapter
7, the target is piloted by a purposeful human
being. It tends to follow a definite simple curve
for a period of time and then to shift to a new
simple curve. Much of the flight is in attempted
straight lines with constant velocity. Most of
the remainder can be considered to be segments
of circles or helices in space, or as segments of
parabolas or higher degree curves. Straight
line constant speed flight corresponds to the
airplane controls in a neutral position. The
helical flight is a natural generalization allow-
ing arbitrary, but fixed, positions of the con-
trols. The curves which are parabolic functions
of time correspond to constant acceleration in
the three space coordinates. Thus, all these
assumptions have a reasonable physical back-
ground.
Most antiaircraft computers are constructed
on the assumption of straight line flight, al-
though some work has been done in World
War II on curved flight directors both with the
helical and the parabolic assumptions. There is
not a great deal of difference in these two
generalizations from the practical point of
view, since determination of acceleration terms
is subject to such large errors in any case.
The important part of this representation
of the target courses is that they consist of
segments of simple analytic curves joined to-
gether. The individual segments are completely
predictable if we have a part of the segment
given exactly. One need merely evaluate the
parameters of the segment from the given part
and evaluate the curve for t - tf. The unpre-
dictable part of the target courses is due to the
possibility of sudden changes from one segment
to another. With random noise functions the
unpredictableness occurs continuously.
This simplified description of the target
courses as piecewise analytic functions must
be recognized as only a first approximation. A
more complete description of the target course
would include the "fine structure," the con-
necting curves between the various analytic
segments and the deviations from the segments
due to random air disturbances and similar
causes. This latter effect, the wandering of the
target from its intended path, might be reason-
ably well represented by the addition of a
random noise function to the piecewise analytic
functions described above.
M THE POISSON DISTRIBUTION OF
SEGMENT END POINTS
The analytic segments of which the course
is supposed to consist are not all of the same
duration — we may assume some probability
distribution of the duration of these segments.
The simplest assumption here is that the
breaks occur in a Poisson distribution in time.
This assumption is not necessary for our
analysis but is a reasonable one and leads to
a simple mathematical treatment. Any other
reasonable distribution would give comparable
results.
A series of events is said to occur in a
Poisson distribution in time if the periods be-
tween successive events are independent in the
probability sense and are controlled by a distri-
bution function
p(l)dl = - e-"« dl .
a
Here p(l)dl is the probability of an interval of
length between I and I + dl. This means that
the frequency of intervals of a given length is
a decreasing exponential function of the length.
This type of distribution is familiar in physics
as describing the decay of radioactive sub-
stances. The time a in the distribution function
is the average length of the intervals, since
a>
CONFIDENTIAL
102
THE ASSUMPTION OF ANALYTIC ARCS
- e-'/a dl
'o °
= a .
It is related to the "half life" 6 of the interval
by
b = a In 2 .
The single number a completely specifies the
Poisson distribution. The events may be said
to be happening as randomly as possible apart
from the fact that they occur at an average
rate of 1/a per second.
Another way of describing a Poisson distri-
bution of events is the following. The probabil-
ity of an event in a small interval of duration
dl is (l/a)dl and is independent of whether or
not events have occurred in any other nonover-
lapping intervals.
IBUTION
S
Let us suppose that we have a record of the
course of the target up to the present time and
a complete statistical description of the set of
target courses. What can then be said about the
position of the target tt seconds from now? If
we were able to analyze the data completely
the most we could obtain would be a probability
distribution function for the future position.
This distribution function would give the prob-
ability, in the light of the course history, of
the target being at any point in space at the
future time. This function would assume large
values at likely points and low values at un-
likely points. For t, small the distribution
would be highly concentrated and for larger lt
it would tend to spread out.
In the simple case we have been discussing,
of a Poisson distribution of sudden changes in
type of course, the distribution consists of two
parts. First, there is a spike of probability at
one point, the continuation of the present pre-
dictable segment. Second, there is a continuous
distribution which corresponds to possible
changes to a new segment during the time of
flight. As t, increases the total probability in
the spike decreases exponentially toward zero,
and the total in the continuous part increases
exponentially toward unity. The behavior is
roughly as indicated in Figure 1.
i
i
i
3-2-1 (
) 1 2 3
Figure 1.
sition of
courses.
Probability distribution of future po-
target, assuming piecewise analytic
A very different type of future position dis-
tribution is exhibited with other assumptions
about the target courses. For example, suppose
the courses were random noise functions with
the power spectrum
P^ = ^Ar-, •
fl2 + 0)2
A typical noise function with this spectrum is
shown in Figure 2. In Figure 3 is shown a
typical velocity under the other assumption,
that the courses are piecewise analytic and in
fact straight lines between breaks. If the
breaks are Poisson distributed, both Figure 2
and Figure 3 have the same power spectrum,
l/(a2 + a.2). The future distribution of veloci-
ties for Figure 3 is shown in Figure 1, and for
Figure 2, it will be as shown in Figure 4. In the
random noise case the future distribution is a
CONFIDENTIAL
THE PROBABILITY DISTRIBUTION OF FUTURE POSITIONS
103
Gaussian distribution with no spike. The center
of this distribution decreases exponentially to-
ward zero with increasing time of flight ac-
cording to the formula
Xtj = A'o e "f
where X0 is the present value of the function
and X., is the mean of the future distribution.
*t t
1
— , 1
Figure 2. Typical noise function.
The standard deviation <r of the distribution in-
creases exponentially toward the rms value of
the function according to
u = A(l - e-*"/).
Supposing that this distribution function
could be determined, where should the gun be
aimed? The answer to this will depend on two
factors: the gun dispersion, and the lethal
o
o
5*
i
Figure 3. Typical velocity function.
effects of the shell. If the gun is aimed to
explode the shell at a certain point in space,
the shell will not necessarily explode at that
point, but rather there will be a distribution of
positions centered about the point aimed at,
because of gun dispersion. Also, if the shell
explodes at a certain point and the target is at
another point, there will be a certain proba-
bility of lethal effect which decreases rapidly
with increasing distance between the points.
These two functions could be combined by a
product integration to give the probability of
t if the target is at one point and
1
1
■2-1 0 I 2 3.
Figure 4. Probability distribution of future posi-
tion of target, assuming courses with random
noise properties.
the gun aimed to explode the shell at a second
point. To determine the probability of a hit
when aiming at a certain point, then, we should
multiply the probability of the target being at
each point in space by the probability of lethal
effect when it is at that point and integrate the
product over all space. The optimum point of
aim will be the one which maximizes this in-
tegrated product.
In one dimension this may be expressed
mathematically as follows. Let P(x) be the
CONFIDENTIAL
104
THE ASSUMPTION OF ANALYTIC ARCS
future position distribution of the target, so
that P(x)dx is the probability of it being in
the interval from x to x + dx at the future time.
Let Q(x,y) be the probability of hitting the
target if the gun is aimed at point y and the
target is at point x. Then the total probability
of a hit when aiming at point y is
H(y)
I
P{x) Q(x,y\ dx .
The point of aim y should be chosen to maxi-
mize R(y).
In the cases we consider, the lethal radius of
the shell and the dispersion of the gun are both
assumed to be small in comparison with the
range of future positions if there is a change
of course during the time of flight. This means
that Q(x,y) is small unless x is xery near to y.
Q(x,y) can be, in fact, considered to be a 8
function of (x-y), and the value R(y) is then
just a constant times P(y). Thus, the best
aiming point under this assumption is the most
probable future position of the target. The as-
sumption of small lethal distance is generally
valid with antiaircraft fire and ordinary chemi-
cal explosive shells.
Now the most probable future position in our
case is the spike of probability corresponding
to the analytic extrapolation of the present seg-
ment of the target course. To determine its
position one must find the parameters of this
segment and evaluate for t, seconds in the
future. For example, if the segments are as-
sumed to be straight lines (constant velocity
target) the velocity components are determined
and multiplied by t, to give the predicted
change in position. These changes are added to
the present position to give the future position.
If helical or parabolic segments are assumed,
the parameters of these curves are determined
from the past data, and the curves extrapo-
lated t, seconds into the future.
These conclusions may be contrasted with
the idea of aiming at the point which mini-
mizes the mean square error. The least squares
criterion amounts to aiming at the mean or
center of gravity of the future distribution of
position. This point will ordinarily be under
the continuous part of the distribution and not
at the spike; e.g., the point marked in Figure 1.
Its position depends to a considerable extent on
distant parts of the distribution, which would
surely bo complete misses in any case. The
chief advanta.:; . the least squares criterion
is that it fits in well with the mathematical
tools suitable to these problems, leading to
solvable equations.
The least squarns < nterion will still appear
in our analysis in rKat we attempt to smooth
our course param>:t. ra in such a way as to
minimize the mean square error in these, a
very different thinp fr m minimizing the mean
square error in th* redicted position of the
••* \ECES<] I V OK A SHARP CUTOFF
The changes in the course parameters be-
tween-adjacent segments can be very large.
Also, at the start of operations and in changing
from one target to another there will be large
and erratic variation of the input to the
smoothing and predicting circuits, unrelated to
the present target course. If any of these data
are used in prediction, the result will almost
surely be a miss because of the small lethal
radius of the shell. The only way to eliminate
these errors in a linear invariable system is to
have all weighting functions cut off sharply
after a short time. Then ail data over a certain
age are eliminated. Hits will occur only when
the target has been on a predictable segment for
this length of time or more and remains there
at least t, seconds in the future.
Suppose the weighting function for velocity
has a 1 per cent tail beyond the cutoff point
and that the trackers start following the target
from a zero position. Then after the smoothing
time there will be, because of the lack of exact
cutoff, a 1 per cent error in velocity. If the
time of flight were 15 seconds and the target
velocity 200 yards per second, this represents
an error of W yards in predicted position.
Since this is comparable to the other errors in
a typical director, we conclude that the tail of
the smoothing curve should not be much greater
than 1 per cent of its total area.
95 CALCULATION OF THE BEST
SMOOTHING TIME
Under the assumptions we have made, the
proper smoothing time to maximize the number
of hits can be determined as follows. Let P(l)
CONFIDENTIAL
CALCULATION OF THE BEST SMOOTHING TIME
.
105
be the probability that a predictable segment
of the course lasts for I seconds or more. In
the Poisson case this function is
P(l) = e-'/a
With a given smoothing time S there will be a
certain probability of hitting the target, as-
suming it has been on the present segment for
S seconds in the past and will remain there for
tf seconds in the future. We assume changes
in course to be so large that any change re-
sults in a miss. This probability of a hit Q(S),
provided it remains on the course, will be an
increasing function of S. Ordinarily the stand-
ard deviation will decrease as the square root
of the smoothing time. We have assumed the
lethal radius of the shell small compared to the
dispersion of shells about the target. The prob-
ability of a hit will then vary inversely with
the volume through which the shells are dis-
persed. If the gun itself had no dispersion but
all errors were due to tracking errors (and if
the tracking error spectrum is flat), the prob-
ability of a hit would then vary as KS*f* for
S in the region of interest. This is because
there are three dimensions and the expected
error in each of these is decreasing as S~1/2.
With gun dispersion present, Q(S) will have
the form
w>-*(.?+.ij)
-3/2
where a, is the standard deviation due to the
gun dispersion, and a2y/a/S that due to track-
ing errors. The sum of the squares is the total
variance in each dimension and the three-
halves power gives the total dispersion volume.
When these two functions P(l) and Q(S)
are known, the best smoothing time is that
which minimizes the product
P(S + tf) ■ Q(S) .
The first term is the probability of a predict-
able segment of the course lasting S -+- tf sec-
onds, and the second term is the probability of
a hit if it does last that long. Therefore, the
product is the probability of a hit with smooth-
ing time S.
In the Poisson case, with no gun dispersion,
the calculation is as follows :
P(l) = e
s + 1,
P(S + tf) = e~~ = Ae
Q(S) = .S«
f(S) = P(S + t,)Q(S) = Be~*'°
■S/a
f'(S) =b[<
-S/a 3 ^1/2 _ l^-S/o^S/!
S = la
2
The proper smoothing time is % of the aver-
age segment length, and is independent of the
time of flight and all other factors.
The presence of gun dispersion and computer
errors which are independent of smoothing
time decreases the best S from this value. In
this case the equation for optimal S is the
quadratic
, 2S 3 a
0;
hence
S
— =
a
=
-4 + a^/c\ + 6<r«
2,?
Here n, is the part of the errors which is in-
dependent of smoothing time (dispersion
errors in the computer, etc.) and at is the error
which varies inversely with the square root of
S, a, being its value at S = a. Ordinarily ^ is
several times a., in which case we have approxi-
mately
~* ~a~ o\
ffi Is
«Tl\2
There are other factors which we have neg-
lected, which decrease the best smoothing time
still further. The wandering of the target about
the predictable segments assumed in the above
simplified analysis makes old data less reliable
and therefore reduces S. Also, there is the tac-
tical consideration that when starting to track
a target it is desirable to commence firing as
soon as possible, even if reducing this time
makes individual hits somewhat less probable.
For these and other reasons the best smooth-
ing time will be just a fraction of a.
CONFIDENTIAL
106
THE ASSUMPTION OF ANALYTIC ARCS
94 NONLINEAR AND VARIABLE
SYSTEMS
The compromise required in choosing a cer-
tain definite smoothing time can be eliminated
by the use of nonlinear elements. In particular,
if a method is devised for determining when
changes of course occur, this indication can be
used to start a new linear but variable smooth-
ing operation, so that the device uses all the
data pertinent to the present segment and no
data from previous segments. There is a clear
improvement in such cases although not so
great as might be expected. There are many
practical difficulties in proper adjustment of
such a "trigger" action. If the trigger is too
sensitive it will assume new segments due
merely to tracking noise and seldom allow suffi-
cient smoothing for accurate fire. If it is too
insensitive it fails in its function of quickly
locating changes of segment. Since the noise
and target courses are subject to considerable
variation, this aujustment is not easy.
In such a system the smoothing may be
linear — the only nonlinearity is the tripping
circuit. The analysis of best weighting func-
tions, etc., given in later chapters can for the
most part be applied to such cases. There may
also be advantages to be derived from making
the smoothing operator depend on the general
position in space of the target relative to the
gun. The smoothing time may be varied, for
example, as a function of the time of flight.
This type of variation would be slow compared
to the noise frequency, and here again the
linear analysis can be used.
Whether any real advantage can be obtained
by "strongly" nonlinear smoothing in practical
cases other than these two possibilities is ques-
tionable.
CONFIDENTIAL
Chapter 10
SMOOTHING FUNCTIONS FOR CONSTANTS
The analytic arc assumption described in
the previous chapter immediately allows us
to reduce a vast proportion of data-smoothing
problems to a relatively conci'ete form. Obvi-
ously the arc will be specified by a number of
parameters and the principal object of the com-
puting and data-smoothing circuits must be to
isolate values of these parameters on the basis
of which a prediction can be made. In practi-
cal cases the instantaneous values of the
parameters are isolated by coordinate con-
verters. The function of the data-smoothing
circuit is to provide a suitable average from
these instantaneous values. This is called
"smoothing a constant'' here since the param-
eters are assumed to be constant along each
arc, although they may change radically from
one arc to another.
The data-smoothing network is most con-
veniently specified by its impulsive admittance.
(See Appendix A.) In accordance with the
assumptions made in the previous chapter, it
will be assumed that the desired impulsive ad-
mittance is identically zero after some limiting
time T. Thus, T seconds after a change from
one analytic arc to the next the new parameter
value is established. T is the so-called "settling
time" of the data-smoothing network.
With the settling time limit given, the prob-
lem of choosing a suitable data-smoothing net-
work reduces to that of finding the best shape
of the impulsive admittance characteristic for
t < T. Obviously this shape determines how
the output of the network changes in going
from the parameter value appropriate for the
first arc to that appropriate for the second. The
exact way in which the response settles from
one constant value to the next is, however,
usually of comparatively little interest. The
shape of the weighting function is of impor-
tance chiefly because of its effect on the noise.
For each noise spectrum there is, in principle,
an optimum shape for the weighting function.
The present chapter approaches the problem of
choosing a shape which will minimize the effect
of noise from several points of view.
It should be noted that the term noise as used
here does not necessarily refer to the errors
associated directly with the tracking data. The
tracking data may have been subjected to co-
ordinate conversions, differentiations, or other
processes of computation before reaching the
data-smoothing network." The noise associated
with the signal to be smoothed thus will usually
have characteristics differing from those of the
noise associated with the tracking data.
10 1 EXPONENTIAL SMOOTHING
Before attacking the problem of smoothing a
constant in a systematic way it is worth while
to consider an important special case. This is
the so-called exponential smoothing circuit. It
leads to a data-smoothing network in which
the output V is related to the input E by
V(t)
r) dr
so that the impulsive admittance W(t) is an
exponential function of time, as illustrated by
Figure 1.
-2 0 2 4 6
Figure 1. Simple exponential weighting function.
An impulsive admittance of the type shown
in Figure 1 does not show any very definite
settling time. The exponential curve ap-
proaches zero gradually, and it is a long time
after a change in course before the effects of
the data obtained on the old course are negli-
gible. This is obviously an undesirable result,
1 In exceptional circumstances the physical apparatus
in which these processes are carried out may also be
sources of additional noise.
CONFIDENTIAL
107
108
SMOOTHING FUNCTIONS FOR CONSTANTS
and the exponential weighting function is con-
sequently not a recommended one for situations
to which the analytic arc assumption applies.
The exponential solution is, however, described
here because it occurs in such a vast variety of
cases. It is found, in fact, whenever the data-
smoothing device is specified by a linear first-
order differential equation with constant coeffi-
cients. It may thus correspond to many simple
situations. For example, this is the result
which would be obtained in an electrical circuit
if we smoothed the data by placing a simple
shunt capacity across a resistance circuit. In
mechanical structures it is encountered when-
ever the damping depends either upon simple
inertia or a simple compliance.
Simple exponential smoothing also occurs in
a variety of other situations which may be
somewhat less obvious. For example, it is the
effective result in either an aided laying or a
regenerative tracking scheme whenever the
ratio between rate and displacement correc-
tions is fixed. Another somewhat similar ex-
ample is furnished by the feedback amplifier
circuit shown in Figure 2. Since rapid fluctua-
Figurx 2. Feedback amplifier circuit giving simple
exponential weighting function.
tions in the output of this amplifier are fed
back through the capacity and tend to oppose
the input voltage, the structure acts as a
smoother, and more detailed analysis would
show that it has characteristics similar to those
obtained by using a shunt capacity across a
resistance circuit. The structure is introduced
here because considerable use is made of it in
connection with the discussion of nonlinear
smoothing in a later chapter.
One simple conclusion about data-smoothing
networks can be drawn immediately from this
discussion. Since all structures simple enough
to be specified by a first-order differential equa-
tion give exponential smoothing, which has no
very well-marked settling time, it is clear that
a data-smoothing network which shows a well-
defined settling time must probably be at least
moderately complicated.
»°» CURVE-FITTING METHOD
Consider the signal E shown in Figure 3
under the assumption that the true signal is
constant and the superposed noise is random
t-T t
Figure 3. Piecewise constant signal with noise.
with a flat spectrum. The best constant A, in
the least squares sense, which can be fitted to
the signal from t - T to Ms that which mini-
mizes
Jt-i
[A - E(X)]3 d\ ,
viz.,
ff-T
E(K) .
(1)
Comparing this with equation (2), Appendix
A, it will be seen that A, which is obviously a
function of t, is the response to the assumed
signal of a network whose impulsive admit-
tance is
W(t)
1
T
0 < t < T
(2)
This is the best weighting function for smooth-
ing under the assumed circumstances. It is
illustrated in Figure 4.
A more complex situation is one in which the
true signal is a line of constant slope with
mu
T
JL
T
Figure 4. Best weighting function for smoothing
piecewise constant signal.
CONFIDENTIAL
AUTOCORRELATION METHOD
109
superposed flat random noise, as shown in Fig-
ure 5. For convenience the analysis will be
conducted in terms of the age variable r » t - \,
t-T t
Figure 5. Piec^wise linearly varying signal with
noise.
The best straight lint' A — Br which can be fit-
ted to the signal from r = 0 to t = T is that
which minimizes
£T[A-Br-E{t-r) Vdr.
Hence A and B must satisfy simultaneously
t t* i rT
Eliminating A, we get
whence by partial integration
(3)
B
t) • t(T - r) dr
Comparing this with (7), Appendix A, it will
be seen that B, which is obviously a function of
t, is the response to the derivative of the as-
sumed signal of a network whose impulsive
admittance is
W(t)
f' fV'f) 0<t<T
(4)
This is the best weighting function for smooth-
ing the derivative of the signal under the as-
sumed circumstances. It is illustrated in Fig-
ure 6 and is generally referred to as the "para-
bolic weighting function."
It should be noted also that the right-hand
member of the first of equations (3) is form-
ally the same as that of equation (1). Hence
the response of the network specified by (2)
0 T
Figure 6. Best weighting function for smoothing
piecewise linearly varying signal.
and illustrated in Figure 4, to the type of
signal shown in Figure 5, will correspond to
the value on the best straight line T/2 seconds
back from t, the present time. This network is
still the best for smoothing the signal, but it
introduces a delay of one half of the smooth-
ing time. The delay may be reduced only at
the price of a reduction in smoothing unless the
smoothing time is increased.
AUTOCORRELATION METHOD
The autocorrelation method with finite set-
tling time was first used by G. R. Stibitz in
numerical determination of the best weighting
function for smoothing the derivative of track-
ing data with typical tracking errors. This
method was also used to determine the sensitiv-
ity of smoothing to departures of the weighting
function from the best form.
The analysis is based up
V{t)
r) W(r) dr t> T
for the response to the derivative of the error
time function g(t) of a network whose impul-
sive admittance or weighting function W(t) is
identically zero for t > T as well as for t < 0.
Since measured tracking errors are generally
tabulated only at 1-second intervals, the in-
tegral may be approximated by the sum
- 1
m+Oi)
m-(H)
for integral values of t.
The instantaneous transmitted power is the
CONFIDENTIAL
110
SMOOTHING FUNCTIONS FOR CONSTANTS
square of this expression, and the average
transmitted power is
P.v, = hill J. V yttt\
* , To
This may be expressed in the form
^•.= LLWm_{t2)-Cm_n-W,_(h) (o)
where
M.a - 1
AT
m — u
is the autocorrelation of the errors. Having
computed the autocorrelation, (5) may be mini-
mized with respect to the W's by familiar
methods, under the constraint
mm 1
1
" - *
The values of W thus obtained are the speci-
fication of the best weighting function." Equa-
tion (5) may then be used to determine the
sensitivity of smoothing to departures of the
weighting function from the best form.
Proceeding along this line, Stibitz found that
the best weighting function for typical actual
tracking errors was generally intermediate to
the uniform and parabolic ones shown in Fig-
ures 4 and 6. Furthermore, Stibitz found
that the difference in smoothing obtained from
the best weighting function on the one hand
and from the uniform or the parabolic weight-
ing function on the other hand, is negligible in
practice.
The autocorrelation method was later for-
malized by R. S. Phillips and P. R. Weiss who
incorporated it into a theory of prediction.7 A
brief exposition of this formulation is given
in Appendix B.
ELEMENTARY PULSE METHOD
For the purposes of this method, an ele-
mentary noise pulse is defined by a time func-
tion F0(t) which satisfies the following require-
ments:
1. Identically zero when t < 0.
2. Contains no terms which increase expo-
nentially with time.
3. Power specLium N(„>2) is the same as that
of the noise.
The noise is then regarded as the result of
elementary noise pulses started at random.
Alternatively, it may be regarded as the result
of flat random noise passed through a network
whose transmission function is S(p) = L
[F„(t)]. As a matter of fact, only S(p) is
required in the analysis, and this is readily de-
termined from the relation
|S(uo)l2 = AF(«*) ,
together with the condition that S(u>) cor-
responds to the transmission function of a
minimum-phase physical structure (cf. Appen-
dix B).
The response F(t) to the elementary noise
pulse Fu(t) of a network whose impulsive ad-
mittance is W(t) is given by the operational
equation
F(() = S(p) ■ W(t)
in accordance with the footnote in Section A.5,
Appendix A. The best form for W(t) is there-
fore that which minimizes the integral
/.:
[F(0iJ dt
under the restriction
when t0 > T
W(t) dt
(G)
(7)
b The computations involved may be considerably re-
duced by noting the symmetry property proved in Sec-
tion B.2, Appendix B.
This is as much of the elementary pulse
method as we shall need in order to reconsider
the cases treated in Section 10.2. For the treat-
ment of more general cases the method is de-
scribed in greater detail in Appendix B.
The minimization of the integral (6) under
the restriction (7) reduces to a simple isoperi-
metric problem in the calculus of variations, in
cases in which S(p) is a polynomial in p. It is
essential first of all, however, to note that if
S(p) is of degree n, the integral (6) will con-
verge only if W(t) is differentiate at least n
times. In other words, W (t) must have con-
tinuous derivatives of all orders up to the
(n-l)th inclusive, although the nth derivative
may have finite discontinuities. In particular,
if W(t) is to be zero outside of 0 < t < T. its
CONFIDENTIAL
ELEMENTARY PULSE METHOD
111
derivatives of orders up to the (n-l)th inclu-
sive must vanish at both t = 0 and t u T. These
2n boundary conditions must be imposed on the
solution of the Euler equation which in this
case is
Wit) = A .
'(*M-i)
a is a constant parameter which is finally ad-
justed to that the restriction (7) is satisfied.
The first case treated in Section 10.2 is one
in which N(„r) = 1, whence Sip) = landF(f)
- W{t). The integral (ti) is a minimum under
the restriction (7) if Wit) is constant by
intervals. The restriction (7) then requires
W(t) to be of the form (2).
The case of first derivative smoothing treated
in 10.2 is one in which X \ *») = «,,2, whence S ip)
= p and Fit) =- Wit). If the integral (6) is to
converge at all, 11/ (t) must not have discon-
tinuities of impulsive or higher type; in other
words, Wit) must be continuous through all
values of t. The integral is a minimum under
the restriction (7) if W(t) is constant by
intervals. The restriction (7) then requires
W(t) to be of the form (4).
These results may be generalized immedi-
ately. In whatever way the signal to be
smoothed may have been derived from the
tracking data, let the power spectrum of the
noise associated with it be N(m2) = a,2". Then
Sip) =p"andF(f) = W^ (t). If the integral
(6) is to converge at all, w'n-n (t) must be con-
tinuous through all values of t. The integral is
a minimum under the restriction (7) if
WVin) it) is constant by intervals. The restric-
tion (7) then requires W(t) to be of the form
W(t)
(2n + 1) !
(
+ 1)\ ft / t \1 ■
ssr [tO-jOJ o<i<T.(8)
It may be noted that the convergence re-
quirements which arise in the foregoing dis-
cussion are directly related to the discussion
and theorem in Section A.8, Appendix A, with
respect to the relationship between discontinui-
ties in the impulsive admittance and its deriva-
tives on the one hand, and the ultimate cutoff
characteristic of the transmission function on
the other hand. The continuity of WlM) (t) is
obviously required to make the transmission
fall off ultimately at the rate of 6(n+l) db per
octave against the rise of 6n db per octave in
the noise power spectrum.
The integral (6) may also be used to evalu-
ate the relative advantage of the best weighting
function over another weighting function. As
an example, consider the case where the weight-
ing function (2) is the best. The value of the
integral (6) in this case is 1/T. If the weight-
ing function (4) is used against the same noise,
the value of the integral (6) is 6/5 T. Hence,
as far as rms error or standard deviation is
concerned, the second weighting function is
V5/6 or 0.913 as efficient as the first.
CONFIDENTIAL
Chapter 11
SMOOTHING FUNCTIONS FOR GENERAL POLYNOMIAL EXPANSIONS
THE THEORY of "smoothing a constant" de-
veloped in the preceding chapter will be
extended in this chapter to the problem of
smoothing a polynomial function of time of any
prescribed degree. The extension is, however,
restricted to the case of a flat noise spectrum.
In addition to the smoothing problem, the
analysis also provides a way of designing a
network which will extrapolate the polynomial
a given distance t, into the future. The network
is so arranged that t, is continuously variable.
In addition, the degree of the polynomial can
readily be changed to fit changes in the com-
plexity of the assumed form of the data, apart
from noise.
It is clear that these results amount, in a
certain sense, to an alternative to Wiener's
method for the design of prediction circuits for
general time series. Thus, to predict a time
series of any given complexity we would need
only to begin with a polynomial of sufficiently
high degree to fit the observed data, and extra-
polate. Aside from the restriction to a flat
noise spectrum, perhaps the most obvious dif-
ference from Wiener's method is the fact that
the settling time restriction limits the data
upon which the prediction rests to a finite in-
terval in the past. To advance such a prediction
theory seriously, however, it would be neces-
sary to go much farther into the way in which
the degree of the polynomial is established and
the justification for assuming that the extra-
polated value represents a probable future
value for the function.'
This general discussion will not be under-
taken here. Since prediction with high degree
polynomials will certainly be sensitive to minor
irregularities in the data, tracking errors
would necessarily limit the application of the
method in any case. If we confine ourselves to
reasonably low degree polynomials, however,
» As an example of possible difficulties we may notice
the fact that two polynomials of different degree which
approximate a given function as closely as possible, in
a least squares sense, in a prescribed interval fre-
quently differ radically outside that interval.
the method is useful. An example is furnished
by the prediction of airplane position, in rec-
tangular coordinates, by quadratic functions of
time. Here the square terms represent the
effects of accelerations in the various coordi-
nates. We can defend the inclusion of such
terms on the ground that it is plausible to as-
sume that an airplane may experience constant
accelerations, due to turns, the force of gravity,
etc., for considerable periods of time. The
linear term represents plane velocity and needs
no defense. The constant term, of course, gives
the plane position at some reference time. In-
cluding it in the smoothing operation is equiva-
lent to introducing "present-position" smooth-
ing of the sort suggested by the broken lines
in Figure 1 of Chapter 7.h
Aside from its direct interest as a possible
prediction method, the analysis in this chapter
is also of indirect interest for the additional
light it sheds on the effect of the noise spec-
trum on smoothing functions. It turns out that
smoothing a power of time, with a flat noise
spectrum, is equivalent to smoothing a constant
with a somewhat different noise spectrum.
Thus the smoothing functions developed for
polynomials are also useful as special cases of
smoothing functions applicable to constants.
n.i
Let A be any past value of time and let t be
the present value. If the data is fitted with a
smooth curve E (k) , the predicted value may be
taken as E(t + tf). The procedure of fitting is
the familiar one of minimizing the integral
[ E(\) - E(\) ]J W,(t,\) rfX
b In the circuit of Figure 1, Chapter 7, however, the
smoothing network would produce a lag in the present-
position data delivered to the prediction circuit, and
this lag would, of course, mean some error in follow-
ing a moving target. In the method described in this
chapter such lags are automatically compensated for
by adjustments in the coefficients of the other terms of
the polynomial.
112
CONFIDENTIAL
113
with respect to disposable parameters in E(k)
and a prescribed weighting function Wn(t,k).
The lower limit of the integral is indicated as
— oo in compliance with the physical impossi-
bility of discriminating between relevant and
irrelevant data, with fixed linear networks, ex-
cept on the basis of age. The burden of dis-
crimination must be relegated to the weighting
function which must be a function only of the
age t - A. Under the ideal restriction that
Wn(t — A) is identically zero when t - A > T or
A < t — T, the indicated lower limit of the in-
tegral is purely nominal.
As in Section 10.2, it is convenient to con-
duct the analysis in terms of the age variable
t = t — A introduced there. If
In terms of the forward time A, (2) and (3)
reduce to
F(r) = F(r) = K{\)
the integral to be mir
in the form
I may be expressed
|>» - F(t)\2 ir„(r) i/t .
tl
In accordance with the discussion of quasi-
distortionless transmission networks in Section
A. 10, Appendix A, the smooth curve K (a)
should be a polynomial in A. Hence F(t)
should be a polynomial in r. It will be more
convenient, however, to express F(t) formally
as a linear combination of polynomials in t
which may be orthogonalized. Hence, let
F{r) = \\+\'i-Gt(T)+\\-(,\(T)+ - +IV^'„<T)
(2)
where G,„(t) is an mth degree polynomial in t.
Let Wu(t) be normalized in the sense that
f W0(r) dr = 1
Jo
and the Gm(r) be orthogonalized with respect
to the weighting function W„(t) in the sense
that
/ G,(t) Gm(r) W0(t) dr = 0 if / * m
Jo » f,
= j - if / = m
(G0 = 1, Ao = 1).
The integral (1) is then a minimum with
respect to the Vm's in (2) if
Vm = km jf 00 F(T) ■ GJt) ■ H'„(t) <tr . (3)
E(\) = Yn(t) + Wit) ■ Gx(t - A) + V,(t) ■ Gt(t - A)
+ - + Vn(t) -Gn(t-\) (4)
where
!'„,(/) = km f E(\) -Gm(t-\). W0(t-\)dk.(5)
Expression (5) identifies the Vm(t) as the
responses to E(k) of fixed linear networks
whose impulsive admittances are
ir,„(r) = k„,Gm(r) : W0(r) . (6)
By (4), the predicted value may be obtained
by a linear combination of the responses of
these networks, viz.,
Mi + U) = Y»(t) + Gii-t,) ■ \\(f) + G,(-if) -Vtit)
+ ■■■ + Gn(-if) ■ Vn(t) . (7)
A schematic representation of an nth order
smoothing and prediction circuit, based on (7),
is shown in Figure 1, where the G„, ( — t,) are
represented as potentiometer factors dependent
on the time of flight.
E(nt,)
E(t>-
I 1 i— Wv-
- Y,(P) -AMAv-i
U 1 G.C-t,)
Y.(P>
AAAr-r
t>
Gn(-V 4-
Figure 1. Schematic representation of nth order
smoothing and prediction circuit.
Alternatively, (7) may be written
K(t + t/) = E(t) + - //) - G,(0)] • V,(0 + •••
+ [Gn( - tf) - G„(0)] • Vn(t) (8)
where E(t) is then replaced by Eit) when
position data smoothing is to be omitted.
It is not necessary that the G,(r) polyno-
mials be orthogonal. However, the circuit
switching required to reduce or increase the
order of the prediction is simplest when the
G„,(t) polynomials are orthogonal. Orthogonal
polynomials corresponding to any
CONFIDENTIAL
114
SMOOTHING FUNCTIONS FOR POLYNOMIAL EXPANSIONS
weighting function W0(T) are readily derived
by well-known methods,.
The weighting function W0(r) may be deter-
mined by either of the methods described in
Appendix B as the best weighting function for
smoothing position data, under prescribed
tracking error characteristics. Then the best
impulsive admittances Wm(T) for a smoothing
and prediction circuit, are prescribed by (6).
The relationship (6) shows that if the pre-
scribed weighting function W0(T) satisfies the
formal requirements for physical realizability,
so will all of the impulsive admittances Wm(r).
Of the standard sets of orthogonal polynomials
those of Laguerre appear to be the best adapted
to physical realization. The Laguerre polyno-
mials L„(a> (T) are orthogonal in 0 < t < oo
with the weighting function rae~\ However,
such a weighting function is, in general, very
unsatisfactory from the practical point of view
of settling characteristics.
It is possible of course to approximate any
prescribed weighting function W0 (t) as closely
as may be desired in a physically realizable
form, derive a set of orthogonal polynomials
based on the approximate form, and determine
the impulsive admittances Wm(T) from (6).
However, such a procedure leads to complexities
of network configuration which increase very
rapidly withrthe index to. This increasing com-
plexity is hardly justifiable in practice.
From the foregoing considerations, it ap-
pears that the most practical procedure is to
derive all of the impulsive admittances Wm(T)
without regard to physical realizability, ap-
proximate them independently in physically
realizable forms of independently prescribed
complexities, and modify or redetermine the
potentiometer factors in accordance with the
discussion in Section A.10, Appendix A.
11 a WEIGHTING FUNCTIONS FOR
DERIVATIVES
The impulsive admittances defined by (6)
for m > 0 may not be regarded as weighting
functions even though the response of the cor-
responding networks to E (a) is, by (5)
Vm (0 - f K(t -r) • Wm (t) 'fir,
Jo
because, with the exception of We(r), the
Wm(T), as will presently be seen, cannot be nor-
malized. The term weighting function is re-
served for the functions defined by (11) below.
Since rr is a linear combination of the G, (t)
where s = 0, 1, • • • , r, it is obvious from (6)
that
oo
/ ?WUl) dr = 0
when r < m .
In particular
/ WJr) dr = 0
when m > 0 .
Since the transmission function Ym(p) of a
network is the Laplace transform of its im-
pulsive admittance (see Section A.3) , we have
/CO
Wm(r) e~'* dr
y ( - p)r r
■
The first m terms in this series vanish. Hence
Ym (p) will be of the form
Tm(p) = r"y-(p) (10)
where ym (0) ^=0. This permits us to regard the
network whose impulsive admittance is Wm(T)
as an instantaneous mth order differentiator,
corresponding to the factor p* in (10), in
tandem with a purely smoothing network
whose transmission function is ym(p).
It is convenient to associate a weighting
function wm (T) with the purely smoothing net-
work whose transmission function is ym(p) .
Dividing (10) through by pm the resulting
operational equation may be interpreted (see
Section A.5) to mean that the weighting func-
tion wm(T) is the m-fold integral of the im-
pulsive admittance Wm(T) between the limits
0 and t. This is expressed by
o Jo WmiT) '{dT)m- (11>
By a relationship similar to (9) between ym(p)
and wHl (r) , it follows from ym (0) ^ 0 that
u>„(r) dr * 0 .
CONFIDENTIAL
LEGENDRE POLYNOMIALS
115
Hence the wm(T) may be normalized in the it is readily determined that
sense that
jT wm (t) dr = 1
jp- / [G«(t)]» W.(t) dr
" ^/ o
(ml)'
(2m)! (2m + 1)! '
for all values of to. However, this may he done
in general only if the G„(t) polynomials, are Then, by (6)
not normalized in the sense that km = 1 i&c any
value of to > 0. It is in fact readily shown that Wm(r) = (-)m .(2rw + U ! pm (2T - 1) 0 £ r :£ 1
the coefficient of i* in G,„(t) must be the same
as that of rm in cT.
11.3
LEGENDRE POLYNOMIALS
m!
= 0 r > 1 .
Substituting this in turn into (11) and making
use of Rodrigues' formula
The Legendre polynomials P„t (x) are orthog-
onal with respect to the range-- 1 < x < 1 and
uniform weighting. In other words, the poly- or
nomials P„(2t — 1) are orthogonal with respect
to the range 0 < t < co and the weighting func-
tion6
( — \m dm
p-<*> " SOT (1 " *>"
p-(2t - 1} - S^r £ M1 - w
W0(r) = 1 when 0 <. r <, 1
= 0 when t > 1 .
It is known from Section 10.4 that this form
for the weighting function W0(t) is best in
case the tracking errors are flat random noise.
In the integral (1) to be minimized, the Gm(r)
polynomials should then be
The first few of these are tabulated below.
it is finally found that
(2m -I- 1)!
= 0 T > 1.
[t(1 - t)]« 0 £ T £ 1
(12)
By a relationship of the form of (9) the
transmission functions ym(p) corresponding to
the weighting functions wm(T) may be deter-
mined. The first three are
1 - e-*
Vo(p)
m
0
Gm(r)
2~r
2 i_I + I1
12 2 2
3 — - + - - -
120 10^ 4 6
6
Vt(P) - Jt l(P - 2) + (p + 2)9-']
V*(P) - p 1(P» " 6p + 12) - (pi + 6p + m-'\.
These may be written in the form
Vm(p) - QmM • rM
where
(13)
With the help of the formula
j [Pm(z))*d*
2m + 1
0 The unit of time being equal to the nominal smooth-
ing time.
&(«)
QM)
0.(«)
CONFIDENTIAL
sin x / J\
-— V - V
X cos z
16 0 ~ xt) SEj * ~ 31 006 * (14)
116
SMOOTHING FUNCTIONS FOR POLYNOMIAL EXPANSIONS
or in the infinite power-series form
„r, (» + «i
Vt(p) = 60 £
■ -0
(n + l)(n + 2)
(n + 5)!
(-P)V (15)
Methods for obtaining physically realizable ap-
proximations to the weighting functions wm(r)
or impulsive admittances Wm(T), based upon
the Q functions (14) and the series expansions
(15) are described in Chapter 12.
CONFIDENTIAL
Chapter 12
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
This chapter will be devoted to a brief re-
view of some of the methods and techniques
which have been used in the physical realiza-
tion of data-smoothing or weighting functions.
The first two sections will be devoted to meth-
ods for determining physically realizable ap-
proximations to a desired weighting function.
The third section takes up the use of feedback
amplifiers and servomechanisms in order to
avoid the use of coils of generally fantastic
sizes. The final section takes up the design of
resistance- capacitance networks.
Methods of deriving physically realizable ap-
proximations of best weighting functions may
be divided into two classes, which may be
called, for convenience, /-methods and p-meth-
ods. The i-methods are those in which a pre-
scribed best weighting function W(t) is
approximated directly by a function W„(t) of
realizable form, viz., a sum of decaying expo-
nential terms and exponentially decaying sinu-
soidal terms. However, the <-methods are most
useful when the approximation is restricted to
a sum only of exponential terms. According to
the discussion in Section A.9, Appendix A, such
a restriction corresponds physically to passive
RC transmission networks. A <-method was
used by Phillips and Weiss in the reference
quoted in Section 10.3 to obtain an approxi-
mation with one decaying exponential term and
one exponentially decaying sinusoidal term.
However, this method rapidly becomes un-
wieldy as the number of terms is increased.
The p-methods are those in which the ap-
proximation is derived indirectly from the
transmission function Y(p) corresponding to
W(t). A rational function Ya(p) approximat-
ing Y(p) is first determined. If it is realizable,
and it usually is, then Wa(t) = L^lYaip)]. In
general, Ytt(p) will have complex poles and,
therefore, Wa(t) will have exponentially decay-
ing sinusoids as well as simple exponentials.
This gives the p-methods a considerable advan-
tage over the f-methods in more efficient use of
network elements. The fact that this generally
calls for impractical element values in passive
RLC networks is not serious. As shown in Sec-
tion 12.3, the use of coils may be avoided
entirely by the use of feedback amplifiers.
121 ^-METHODS
To describe the ^-method," let
Wa(t) = Aie-i\ + A*—* + ■ ■ ■ + Aen-.t (1)
where the a's are prescribed and the A's are to
be determined. Two considerations are involved
in the determination of the A's. The first con-
sideration is based on the relationship between
the continuity conditions at t = 0 and the ulti-
mate slope of the loss characteristic as ex-
pressed in the theorem in Section A.8. Accord-
ingly, a number of relations of the type
Ai + A-i + ■ ■ . -f- An = 0
a\ Ax + a, At + ... + a„ A„ =0 (2)
«' A , + al A2 + . . . + a„r An = 0 r < n - 1
must be satisfied. This leaves n - r - 1 of the
A's for the second consideration.
The second consideration concerns the man-
ner in which the approximation in the range
t > 0 is to be made. The approximation may,
for example, be required to pass through
n - r - 1 points on W(t) or, the first n - r - 1
moments of the approximation may be required
to be equal to the corresponding moments of
W(t). The latter is expressed by relations of
the type
Ai A2 An 1 /*c°
-+-+■■■+- = —77, / W(t) /— dt
s - 1, 2, • • • , n - r - 1 (3)
Foster's investigations were concerned only
with the parabolic weighting function (4)
Chapter 10, so that only the first of (2) was
involved. Numerical studies led to the belief
that, with a given number of a's, the best ap-
proximation was to be had from the case in
■ The i-method is principally due to R. M. Foster.
CONFIDENTIAL
117
118
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
which all of the a's are equal. Hence the natural
center of attention was the special form
Wa(t) = (Ait + Ad* + • ■ • + An-if -»)«-*. (4)
At large values of t this expression reduces ap-
proximately to the last term, and if it is as-
sumed that An.i = 1, the settling condition fixes
a to at least a first approximation. The rest of
the work of approximating the parabola is then
equivalent to a problem in polynomial approxi-
mation. Once the A's are determined, a better
value of a can be found from the settling con-
dition, and the process gone through again.
If the a's are only approximately equal, the
approximation will still behave approximately
like (4) with an average value used for a. The
difficulty with equal or nearly equal a's is that
it leads to networks with extreme element
values. In order to secure satisfactory element
values, it is generally necessary to depart sub-
stantially from the condition of equal a's. This
results in some, but not a large, loss of effi-
ciency in approximating the parabola. Foster
recommends that the a's be chosen as a geo-
metric series, with their geometric mean more
or less around the equivalent point for equal
a's. With four a's he suggests that the constant
ratio in the series may be 3:2, whereas with
only two a's the ratio should be raised to 2:1.
These are, however, only rough values and
obviously depend on individual opinion of what
constitutes an unreasonable element value.
As a matter of experience, it turns out that
the characteristic first obtained usually has a
rather long and slowly decaying tail, as shown
in Figure 1. This, of course, is equivalent to a
Figure 1. Approximation to parabolic weighting
function, showing poor settling characteristic.
correspondingly long "settling time," or time
before a useful prediction can be made. In
practice, therefore, after the preliminary
design has been found, adjustments are made
to bring the tail of the curve under control,
partly by modifying the values of the A's
slightly, and partly by contracting the time
scale to bring the part of the tail which remains
appreciable within the allowable settling time
limits. This leads to the somewhat lopsided
match to the parabola shown in Figure 2.
Figure 2. Approximation to parabolic weighting
function, showing better settling characteristic.
A method of bringing the tail of the curve
under control" is to minimize the expression
where
/{Wa(t)]2d! = 2£ C,„A,A,
(5)
-<.,+«m)r
ai + am
under the restrictions (2) and all but the last
of (3).
The f-methocj used by Phillips and Weiss is
based on a 3-term approximation of the form
(1) in which one a is real while the other two
may be conjugate complex. The a's are not
prescribed, so that there are six parameters to
be determined. Four restrictions are imposed,
viz., the first of (2), the first of (3), a restric-
tion on the value of the tail area, viz.,
-.r
W.(t)dt = ZAL£_L,
't '-1 at
and the cross-over condition
Wa(T) = 0.
Finally, the transmitted noise power, which,
under the assumption of flat random noise as-
sociated with the position data, takes the form
(see Section 10.4)
r
[W.(t))t di
is minimized with respect to the two remaining
parameters by numerical methods.
" Used by R. F. Wick.
CONFIDENTIAL
— —
/>• METHODS
-*-
119
12.2
p-METHODS
Three p-methods have been used. These will
be described in chronological order.
The first p-method is one which was used by
R. L. Dietzold in exploiting the use of feedback
amplifiers to secure the advantages of approxi-
mations with complex exponentials. The trans-
mission function Y(p) corresponding to the
best weighting function W(t) is first formu-
lated. The loss characteristic, -20 log,„ \ Y(im) |,
is next computed and plotted against the fre-
quency on a logarithmic scale. Then standard
equalizer design techniques are employed to ap-
proximate the loss characteristic, keeping in
mind that the transmission loss in the feedback
network of a feedback amplifier becomes a
transmission gain for the circuit as a whole
(14) of Chapter 11, we get
J/o (p) =
Vi(p) =
2 + p
12
y*(p)
12 + 6p + p»
120
(6)
The second p-method is merely a more com-
plete analytic formulation of the first, thereby
avoiding the necessity for employing equalizer
design techniques. It depends upon the possi-
bility of expressing the transmission function
corresponding to the best weighting function,
in the form of equation (13) Chapter 11, which
is associated with the symmetry of the weight-
ing function, as shown in Section A.7. The
method is based upon the determination of the
envelope of the Q-function. The Q-function is
first differentiated in order to obtain the
equation which determines the values of «
at which the maxima and minima occur. This
transcendental equation is not solved but is
used to eliminate the trigonometric functions
in the expression of the Q-function. The result-
ing expression, which is an irrational function
of «o2, is then squared in order to make it a
rational function of »>. The substitution
p* = - o.2 is made and the expression is then re-
solved into two factors of which one contains
all the poles with negative real parts while the
other contains all the poles with positive real
parts, the two factors being conjugate complex
when p = to>. The first factor is then taken as an
approximation of the desired transmission
function. Applying the method to the desired
transmission functions defined by (13) and
120 + 60p + 12p* + p» •
This last is the basis for the design of a posi-
tion and rate smoothing circuit for a proposed
computor for controlling bombers from the
ground."11 This design is described briefly
in Chapter 13.
The third p-method is based upon the ascend-
ing power-series expansion of the transmission
function corresponding to the best weighting
function. Examples of such power series are
given by (15) of Chapter 11. The method of
approximation is one which is credited to Pade
in 0. Perron's "Kettenbruchen."" If the discus-
sion in Section A.8 is referred to, it will be seen
to be also a method of moments.
The method consists in determining the co-
efficients in a rational function of the form
1 + QiP + Qip» + j- ampm
1 + blP + 6,p» + . . . + 6„p» w
so that the ascending power-series expansion
of the rational function will agree with that of
the best transmission function, term for term
up to and including pm**. If the series for the
best transmission function is
1 + cp + c,p* + . . . + c«+„p»+" + . . . (8)
the equations which determine the coefficients in
(7) are obtained by equating coefficients of
corresponding powers of p, up to and including
the (m + n)th, in
(1 + blV +
and
+ fe.p") (l + c,p + • • •
+c-+.p"+")
1 + <HP + • • • + anpm.
The last n equations will be homogeneous in
the 6's and c's.
It has been expedient in some cases to omit
the last few of the (m+n) equations in order
to have some control over the number of real
roots and poles and the number of conjugate
pairs of complex roots and poles in the result-
ing rational function.
In the assumed rational expression (7) the
CONFIDENTIAL
120
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
difference n — m "Should be chosen so that the
ultimate slope of the loss characteristic will be
the same as for the best transmission function.
According to the theorem in Section A.8, if
W(t) behaves like if as t->0, we should take
n — m = r + 1. As a matter of experience the
rational expression has invariably turned out
to be physically realizable whenever this "rule"
was followed. Frequently, however, the rational
expression has turned out to be physically
realizable under small departures from the
rule.
Examples of this method are given in Chap-
ter 13.
USE OF FEEDBACK AMPLIFIERS
AND SERVOMECHANISMS
In this section we shall describe the use of
feedback amplifiers and servomechanisms to
obtain desired transmission functions. For com-
plete discussions of the most recent technical
advances in the analysis and design of feedback
amplifiers and servomechanisms the reader
should consult some of the modern literature
on these subjects.2 3-51sl61T
Let us assume that we have two networks
whose transmission functions are Yt(p) and
Y2(p), respectively, as shown in Figure 3. For
Y2(P) ^>V(t)
I£(t) = Y2(p)-V(t)
itic representation of networks
ick circuit application.
a signal E(t) applied to the first network the
short-circuit output current is /,(£) = Yx(p)'
E(t). For a signal V(t) applied to the second
network the short-circuit output current is
1
Vi2
Figure 4. First step in combining networks.
hit) = 7, (p) -7(0- With the networks sharing
a common short-circuiting conductor as shown
in Figure 4, the current through the conductor
is 7, -I- I2. If the source which develops the volt-
age V(t) across the input terminals of the
second network were in fact under the control
of the current through the conductor, as shown
schematically in Figure 5, in such a manner
Figure 5. Output voitage controlled by short-
circuit current across intermediate terminals.
that it had to develop that voltage V(t) which
reduces the current in the conductor to zero,
then
Yxip) E(t) + Yt(p) ■ V(t) = 0 .
Hence, the transmission function (now a volt-
age-voltage ratio) of the arrangement shown
in Figure 5 must be
Yi(p)
Y(p) = -
(9)
Y,(p) '
This relationship provides a method of ob-
taining transmission functions with complex
poles without the requirement of coils.0 The
complex roots of Y(p), must be assigned to the
numerator of Y1 (p) , and the complex poles of
Y(p) to the numerator of Yt(p). Aside from
this, the other roots and poles of Y(p) may be
assigned in any way which is favorable to good
design practice. Redundant factors may be in-
troduced if they are desirable, as is done in the
examples described in Sections 13.1.5 and 13.3.
The source of the voltage V(t) in Figure 5
does not' have to be controlled by the current
through the short-circuiting conductor. Since
the current through any short circuit must be
zero if the voltage across the short-circuited
terminals is zero before the short circuit is con-
nected across them, the source of the voltage
V(t) may just as well be controlled by the
open-circuit voltage, as shown in Figure 6. It
is clear that the source of the voltage V(t) is
ideally an infinite gain amplifier. It is not nec-
essary, however, that the amplifier have ideally
unilateral transmission and infinite input and
output impedances, since departures from these
ideal characteristics may be compensated for in
the design of the feedback network.
The simple result expressed by (9) may be
readily modified to take account of the finite
0 This observation was first made by R. L. Dietzold.
CONFIDENTIAL
DESICN OF RC NETWORKS
121
gain of a physical amplifier. The modification
will be expressed as an extra factor which
corresponds to the "rf effect" or "nfi error"lie
commonly encountered in the theory and design
of feedback amplifiers.
■C
7T
Figure 6. Output voltage controlled by open-
circuit voltage across intermediate terminals.
The exact transmission function of the cir-
cuit shown in Figure 6 is most simply ex-
pressed in terms of the following quantities:
= current through a short across ter-
minal-pair No. 3, per unit emf applied
across terminal-pair No. t.
Y2 (p) = current through a short across ter-
minal-pair No. 3, per unit emf applied
across terminal-pair No. 2.
Z2 (p) = impedance between terminal-pair No.
2, with terminal-pair No. 3 shorted.
Z3(p) = impedance between terminal-pair No.
3, with amplifier dead, terminal-pair
No. 1 shorted, and terminal-pair No. 2
open.
G(p) =transadmittance of amplifier.
Then
i -
i
(10)
The quantity GYJZ„Z3 is the of the circuit.
The quantity Y,Y,Z„Z3 to which Y reduces
when G = 0 represents the direct transmission
of the circuit.
The active impedance across terminal-pair
No. 2 is
Zip
(ID
ZtA
1 — Gi 2Z2Z3
where
ziP = zt{\ + r|?,z,) . (12)
ZtP is the passive impedance across terminal-
pair No. 2. It differs from Z„ in that terminal-
pair No. 3 is open.
The exact expression (10) of the transmis-
sion function is useful chiefly as a check on the
simpler but approximate expression (9). It is
in general quite practicable to make the trans-
admittance or transconductance G of the am-
plifier large enough so that the n0 effect may be
neglected.
In accordance with the sense in which the
term "servomechanism" is used by MacColl,4
a feedback circuit, such as that shown in Fig-
ure 6, is a servomechanism — more specifically,
an electronic servomechanism — since it oper-
ates on the ideal principle of maintaining zero
voltage across the terminal-pair No. 3. An
electromechanical counterpart of the circuit
shown in Figure 6 is shown in Figure 7. These
2- PHASE INDUCTION
MODULATOR MOTOR
: 7. Electromechanical counterpart of feed-'
back amplifier circuit resulting in servomechaniMti.
circuits assume that the signal E(t) is a modu-
lated d-c carrier.
If the signal is a modulated a-c carrier,
"shaping" cannot be done conveniently by elec-
trical networks. The difficulty may be avoided
by various special devices. An example is de-
scribed and illustrated in Section 13.4.
12.4
DESIGN OF RC NETWORKS
In this section we will describe and illustrate
two general methods of designing RC networks.
The first is most useful when the transmission
function is finite and not zero at zero fre-
quency; the second, when the transmission
CONFIDENTIAL
122
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
function is zero at zero frequency. The case of a
transmission function with a pole at zero fre-
quency will not be considered, since it is cov-
ered by the methods , described in the preceding
section, in conjunction with the methods de-
scribed below.
Let
Y(p)
Op + QiP + ••• + Q.+iP"*1
(flo>0) (13)
1 + 6iP + • ■ • + 6»p"
with simple, real, negative poles. Dividing by
p, expanding into partial fractions and multi-
plying through by p, we get
On V + «1 P + «»
\p + Mi P + fit
•)
+
)
where the A's, B's, ats and 0"s are positive real
quantities. The first term must be associated
with those in the first parentheses if an+l > 0,
with those in the second parentheses if an+, < 0.
The transmission function is now in the form
Y(P)=YAP)-YB(P) (14)
where YA(p) and YB(p) are physically real-
izable driving-point admittances of RC type.
Each term of the form pA/ (p + a) is the admit-
tance of the two-terminal, two-element network
a ..a
s — wwv — 1| — 0
Figure 8. Simple RC network.
shown in Figure 8. Each term in (14) there-
fore represents a parallel combination of two-
element networks of the type shown in Figure
8 and a conductance a0 in the case of YA(p),
PHASE
INVERTER
SUMMING
AMPLIFIER
Figure 9. Method of realizing RC transmission
functions, requiring phase inverter.
and a capacitance |Onn|/b„ in the case of either
YAP) or YB(p). By well-known methods these
two-terminal networks may be transformed
into a variety of other configurations.
The transmission function (14) may be real-
ized in the arrangement shown in Figure 9
or in that shown in Figure 10. The latter is
a lattice network which is suitable only in a
LINE BRANCH
I = (YA-YB).E
Figure 10. Lattice prototype for passive net-
works with RC transmission characteristics.
balanced-to-ground circuit. To obtain an un-
balanced passive equivalent of this network we
may resort to steps which will be described
later in this section.
The second general method of designing RC
networks is most useful when
Y(r>) = r> a° + a'P + • ■ + q"P"
KV) P 1 + blV + ••• + 6.p-
(«o > 0)
(15)
with simple, real, negative poles. Now, if the
lattice in Figure 10 were driven from an in-
finite-impedance source of current /„, the out-
put current would be
1 -
/ =
I*
Ya
Yh'
1 t7~
If, furthermore,
Is
Ya
then
P
»+!
p
(16)
Taking it for granted for the moment that the
lattice can be transformed as shown schemat-
ically in Figure 11, we may then discard the
condenser across the output terminals and, by
Thevenin's theorem,1" we may replace the
condenser across the input terminals and the
infinite-impedance current source by a series
condenser and a zero-impedance voltage source.
The result is shown in Figure 12. Since
CONFIDENTIAL
desk;* of rc networks
123
V F.
I, - pC E we now have
7 = ( "
k
which ia the desired result, to a constant factor.
The factor k should in general be taken as
small as possible subject to the requirement
that all the roots and poles of (16) be simple,
Figure 11. Step in transformation of networks
with zero transmission at zero frequency.
real, and negative. It can always be taken large
enough to fulfill this requirement. A suitable
value may be easily chosen by inspection of a
plot of Y (p) fp for negative real values of p.
Figure 12. Final step in transformation of net-
works with zero transmission at zero frequency.
The numerator and denominator of (16) are
of equal degree and therefore contain the same
number of linear factors. These factors may be
assigned to YA or to YB arbitrarily except that
YA and YF must be physically realizable driv-
ing-point admittance functions which behave
ultimately like condensers as the frequency in-
creases indefinitely; that is, roots and poles
must alternate and there must be a simple pole
at infinity.
There are five kinds of steps which may be
taken to transform a lattice into an unbalanced
form. These steps are based upon Bartlett's
bisection theorem,14 and may be taken in any
order and as often as necessary. Each of them
will now be described as it would be applied
directly to Figure 10. In the following diagrams
a lattice enclosed in a rectangle means an un-
balanced network whose configuration may not
be known yet, but whose lattice prototype is as
indicated.
1. Shunt network pulled out of both branches :
shown in Figure 13.
2. Shunt network pulled out of the line branch
only: shown in Figure 14.
3. Series network pulled out of both branches :
shown in Figure 15.°
4. Series network pulled out of the lattice
branch only : shown in Figure 16.c
Figure lii. Step in transiormauon oi lattice;
shunt networks pulled out of both branches.
Figure 14. Step in transformation of lattice;
shunt network pulled out of line branch only.
Figure 15. Step in transformation of lattice;
series networks pulled out of both branches.
i
■
i
ft
Figure 16. Step in transformation of lattice;
series network pulled out of lattice branch only.
* Given in impedance form.
CONFIDENTIAL
124
PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS
5. Breakdown into parallel lattices: a fairly
obvious step which need not be illustrated.
As an example of (13) consider
I(P) l+blP
where all the coefficients are positive. Since
y(p) = P£} -f- a0 - Oil. ~ °lbl + ff»)p
there is no problem if a, > (a,/^) + a^^ But if
Ox < (aj/6,) + a06x we have the problem of trans-
v — 5 —
Figure 17. Illustrative lattice prototype.
forming the lattice in Figure 17. We can apply
steps 2 and 4 immediately, but find that the
residual lattice cannot be transformed unless
a, > {ajb,). Under this additional restriction
we can apply step 8 obtaining finally the net-
work shown in Figure 18.
As an example of (15) consider
Taking k = 1 (the smallest value which may be
assigned) , we get
Yb m 2p(3 + 16p)
(1 + 2p) (1 +
One way of choosing YA and YB is
Y (1 + 2p) (1 + 16p)
A 2(3 + 16p)
This leads finally to the network shown in Fig-
ure 19. Such a simple network is possible of
YB = p.
course because F(p) happens to satisfy the re-
quirements of a physically realizable driving-
point admittance function. However, another
way of choosing YA and YB is
YA
l_±_2p Y p(3 -I- 16p)
2 * " 1 + 16p
This leads to the network shown in Figure 20.
II
Figure 18. Unbalanced equivalent of illustrative
lattice prototype when 02/61 <oi< (a2/6i) + 006!.
Ro=l2
)
— wv\a — 1| —
0 =44 r = —
1 5 c« 9
Figure ltf. KC' network with zero transmission at
aero frequency.
C0=l Ro=2
-AAAAAr
R0=2
■AAAAAr 1
R,= 3
:C,=4
Figure 20. Another /2C network with zero trans-
mission at zero frequency.
CONFIDENTIAL
Chapter 13
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
rpHE ILLUSTRATIVE material described in this
J- chapter is taken from four practical appli-
cations.
1. Second-derivative circuit for the M9 anti-
aircraft director.
2. Position data smoother for the "close sup-
port plotting board," with delay correction for
constant velocity aircraft.
3. Position and rate circuit for the "com-
puter for controlling bombers from the
ground," with optional delay correction of posi-
tion data for constant-velocity aircraft.
4. Position and rate circuit using electro-
mechanical servomeeha.'Msms.
The design and analytical procedure used in
the first application has not heretofore been
described in writing. Hence, considerably more
space will be devoted to it than to the other
three applications. The latter have been de-
scribed in detail in reports.1" 1; 13
ls 1 SECOND-DERIVATIVE CIRCUIT
DESIGN
,, M Realizable Approximation of Best
Transmission Function
The best transmission function for the sec-
ond-derivative circuit was taken to be
JVp) = p%(p) ,
in the notation of Chapter 11. This assumes fiat
random noise in position data and, arbitrarily,
1-second smoothing and settling time. The
series expansion of y.,(p) is, according to ex-
pressions (15) of Chapter 11,
yf(p,-i -Ip + ip.. JLp. + jl-p*...,.
The form of the rational approximation,
yip) =
1 + 6,p + b2p* + b3p3 + b<p4'
was chosen for simplicity under the require-
ment that the transmission function p*y(p)
should cut off at the rate of 12 db per octave."
This requirement was set as a precaution
against noise due to granularity of the coordi-
nate-conversion potentiometers in the director.
Following the procedure outlined in Section
12.2 the following equations were obtained :
!>i — 2 = 0
0
b< -\bi + lbt -± b1 + 1^
1 h - 3 h 1
2' J 28' 1 ~ 53
84'
whence
Since
p* + 21pJ + 189p* -(- 882p + 1764
21 + V21
1
1764
- ip» +
P + 42)
x rp« + 21 -y^p + 42) ,
2
yAv) would have two conjugate pairs of com-
plex poles, viz.,
p = - 6.40 ± il.047, - 4.10 ± t6.02,
of which one pair is very nearly real.
In order to simplify the circuit design, how-
ever, it was desirable to limit the number of
complex poles to a single conjugate pair. This
was accomplished by leaving b4 arbitrary so
that the denominator of y2(p) was
1 + 5p + kp,+ 8lp, + bipt •
A value for bt which would make this expres-
sion vanish at two negative real values of p
was found by plotting
176464 - 5 (*» - Ox* + 42x - 84)
' The design antedated the formulation of the n — m
= r + 1 rule given in Section 12.2, according to which
the best transmission function should have been taken
as p'y,(p) in the notation of Chapter 11. However, no
trouble waa experienced in obtaining a physically real-
izable approximation, of the complexity assumed.
CONFIDENTIAL
125
126
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
against x, as shown in Figure 1. The right-
hand member is positive only in the range
x > 3.77 and has a maximum of 0.982 at about
z = 6.63.
1.0
08
06
04
02
1764 b4
i
XJl
1.0 2.0 4.0 6.0 6.0 IO0
Figure 1. Graphical determination of 6«.
In order to obtain a substantial separation
between the two real poles of y2(p), the value
17646, = 0.5 was chosen. The approximation
V(P)
1
3528
has poles at
p - - 4.17391 , - 31.72813 , - 3.04898
* t 4.16463 .
The series expansion of y., (p) agrees with that
of Vt(p) to four terms, the fifth term being
37/7056 p* instead of 5/1008 p\ The difference
in the fifth term is less than 6 per cent.
The realized approximation and the best
weighting function are shown in Figure 3.
is.u Transient Responses
The responses of the physical network whose
transmission function is p2y2(p) are compared
to those of the best network whose transmis-
sion function is p2y2(p), in Figures 2, 3, and 4.
The signals for which (and the formulas by
which) these responses were computed are
tabulated below.
Response formulas
Realized Best
L~Hm(p)\ 00/(1 -20(1 -/)
L~l\Vdv)\ mu\-t)\*
Figure
Signal
/ <0 I £0
2
0 1
3
0 t
4
o >f
V
/'(10- 15/ + 6/1)
It has been noted that Figure 3 also repre-
sents the best and the realized weighting func-
tions.
mauko
u
u
it
_II»T
\
<
h
»
•
1 »
\ t
\ «
0
u
V
to
\ 1
\ \
V*
*
t
V 1M M V HB IM Mm 1
Figure 2. Responses to step function, viz., E (t) =
1 when t > 0.
u
u
u
<u
\
A,
!
. ICST
w
i
KALIItO
0
M
%
—
t
Figure 3. Responses to linear ramp function, vfz.,
E(t) - t when t > 0; second derivative smoothing
functions.
~0~
Figure 4. Responses to parabolic ramp function,
viz., E(t) = (%)£ when t > 0; second derivative
settling characteristics.
CONFIDENTIAL
SECOND-DERIVATIVE CIRCUIT DESIGN
127
If a signal of the form
Eif) = at + a J + -., (hfi
were to be applied suddenly to the second -de-
rivative circuit at t = 0 the response would be
r'-; ! (;)-•;•< (?)+*.•<■(?)
where A,„ A,, A . stand for the responses shown
in Figures 2, 3, and 4, respectively, and where t
is the time in seconds and T is the nominal
smoothing time. The response V(t) is the indi-
cated acceleration of the target.
The sudden application of the instantaneous
position and velocity components of the signal
to the second-derivative circuit will give rise to
some very serious consequences unless special
measures are taken to mitigate them. To see
this let it be assumed that T = 20 seconds and
that the target is at such a range that a„ =
20,000 yards when the signal E (t) is applied
to the second-derivative circuit. Each unit of
A0 in the ordinate scale of Figure 2 then repre-
sents an indicated acceleration of 50 yd per
sec-. Referring to Figure 2 it is clear not only
that the effective settling time will be several
times the smoothing time but also that the indi-
cated acceleration will go through exceedingly
large maxima.
Exceedingly large transient responses are
not peculiar to second-derivative circuits. They
occur also in first-derivative circuits in linear
prediction, where they are due entirely to the
initial position term in the signal. In all cases
they are reduced to harmless proportions by
special arrangements of the circuits during the
operation of slewing.
tion Ys of the experimental second-derivative
circuit design, also referred to a nominal
smoothing time of 1 second. The transmission
function of the linear prediction circuit with
10-second smoothing of first derivative is then
:— JTTT
Table 1*
»/
. -
Yi
Y,
1
0.174
i
0.666
—0.454
i
0.165
2
0.651
1.166
—1.442
1.212
3
1.312
1.358
— 2.014
3 527
4
1.943
1.203
—1.069
6.688
5
2.382
0.821
2.000
9.409
6
2.599
0.364
6.575
10.115
7
2.637
-0.067
10.893
8.220
8
2.558
—0.429
13.468
4.695
0
2.416
—0.711
14.096
0.953
10
2.242
—0.920
13.401
— 2.092
11
2.062
—1.070
12.064
— 4.320
12
1.885
—1.172
10.530
— 5.777
13
1.720
-1.238
9.027
—6.704
14
1.566
-1.279
7.652
-7.169
15
1.429
-1.299
6.438
-7.398
lb
5.382
-7.446
17
4.471
-7.374
18
1.096
-1.286
3.683
-7.221
19
1.004
-1.268
3.015
-7.025
20
0.926
-1.247
2.436
-6.795
22
0.790
-1.198
1.509
-6.292
24
0.683
-1.145
0.818
-5.780
26
0.593
-1.091
0.301
-5.287
28
0.518
-1.040
0.088
-4.828
30
0.457
-0.380
-4.402
32
0.407
-0.945
-0.599
-4.016
34
0.364
-0.902
-0.762
-3.666
36
0.326
-0.862
-0.881
-3.348
38
0.296
-0.825
-0.967
-3.062
40
0.266
-0.790
-1.026
-2.800
• f is in
c when smoothing time T = 1
sec. For
T-second net-
works. values of 9/ are multiples of 1/9T e, values of Yt should
bo divided by T, and values of Yt should be divided by T». The
lwo networks may have different values of 7*.
13.1.3
Effect of Tracking Errors on while that of the quadratic prediction circuit
Accuracy of Prediction with 20-second smoothing of second derivative
The statistical effect of tracking errors on 1S
the accuracy of prediction is most readily de-
termined from the power spectrum of the
tracking errors and the transmission function
of the prediction circuit.
Table 1 gives the values of the transmission
function F, of the first-derivative circuit in the
M9 director, referred to a nominal smoothing
time of 1 second,'1 and the transmission func-
>V/0
(0.9-
9494_ K.077 31 74
1.6 V + 2.4 /. -r :Ui
27 01 \
v + ah)
Y,(P) - JVp) +
r»(20p)
i G2 are determined in accordance
with the discussion in Section A.10. Since
we get
)',(p) = p(l - 0.3724p +
)-,<p) = p2(l -•••)
,
0', = //
ft - I </ + 3.7241, .
)
CONFIDENTIAL
128
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
Table 2 gives the values of \Yi(p) |J and of
\Yq(p) \* for tt = 5, 10, 15, 20 seconds. These are
plotted in Figures 5, 6, 7, and 8.
of the total power, or an rms error of 15.8
yards out of 17.9 yards.
The rms error of prediction is the square
root of the power transmitted by the prediction
circuit. This is tabulated on the last line of
Table 2 and in the smaller table following.
Figure 5. Power transmission ratio of linear
and quadratic prediction circuits with 5-second
prediction time.
The last column of Table 2 and Figure 9
give the power spectrum of a composite of the
range and transverse errors in a typical run
The power contained in the frequency range
covered by the table accounts for 78 per cent
40
rawt* THANsyiuiON «atio
V-IOMC
JO
a
-quad nta
20
IS
10
3
0
»0f
4
1
i u
» II 20
Figure 6. Power transmission ratio of linear and
quadratic prediction circuits with 10-second pre-
diction time.
Table 2
10
90/
IFil*
|Tff*
! Y,\* I
0
1.00
1.00
1.00
1.00
1
1.29
1.13
1.82
1.60
2
2.10
2.76
4.08
8.90
3
3.20
6.85
7.19
26.73
4
4.2
10.0
10.1
39.5
5
5.0
10.5
12.1
39.9
6
5.3
9.8
13.1
35.6
7
5.4
8.8
13.2
30.8
8
5.2
7.9
12.8
26.6
9
5.0
7.1
12.2
23.0
10
4.7
6.3
11.4
20.0
11
4.4
5.7
10.5
17.5
12
4.1
5.1
9.7
15.3
13
3.8
4.6
8.9
13.5
14
3.6
4.2
8.2
12.1
16
3.4
3.8
7.6
10.6
16
3.2
3.5
7.0
9.5
17
3.0
3.2
6.5
8.5
18
2.8
3.0
0.0
7.7
19
2.7
2.8
5.6
7.0
20
2.5
2.6
5.3
6.3
rms
error of
prediction
23.9
29.5
33 9
53.4
15
20
IK.!*
\Y,l*
P* Mk-vn
1.00
1.00
1.00
1.00
31.4
2.59
2.71
3.59
4.81
33.5
6.97
23.16
10.74
50.35
35.7
12.96
72.51
20.51
159.43
19.7
18.6
106.1
29.76
231.3
3.6
22.4
104.4
35.9
223.9
2.5
24.3
90.6
38.9
190.6
1.2
24.6
76.6
39.4
158.4
1.6
23.8
64.7
38.2
131.8
2.1
22.5
55.0
36.0
110.6
1.4
21.0
47.0
33.5
93.5
0.7
19.3
40.4
30.8
79.6
0.8
17.7
35.0
28.3
68.2
0.8
16.3
30.4
25.8
58.9
0.5
14.9
27.1
23.6
52.0
0.3
13.7
23.4
21.6
44.5
0.8
12.6
20.6
19.8
39.0
1.1
11.6
18.3
18.2
34.4
0.8
10.7
16.3
16.8
30.4
0.4
0.7
9.9
14.6
15.5
27.0
9.2
13.1
14.4
24.1
1.0
44.5
85.4
55.4 125.0
• P U in uniu of 180 yd" per c
CONFIDENTIAL
SECOND-DERIVATIVE CIRCUIT DESIGN
129
Time of flight
in seconds
5
10
15
20
Rms error of prediction due
to tracking errors in yards
Linear Quadratic
23.9
33.9
44.5
55.4
29.5
53.4
85.4
125.0
It is obviously relatively disadvantageous to
use quadratic prediction when the target is in
fact flying a rectilinear unaccelerated course.
Figure 7. Power transmission ratio of linear
and quadratic prediction circuits with 15-second
prediction time.
1
1
POWER TRANSMISSION RATIO
X,'10XC
2M
MO
QUAD MED
IM
00
41
UN
preo
*
1 1 i
J
1
J — I
i r
•
1 2o
Figure 8. Power transmission ratio of linear and
quadratic prediction circuits with 20-second pre-
diction time.
The relative advantage of linear prediction
should persist for target paths with only a
slight amount of curvature, but this relative
advantage should decrease as the curvature is
increased. When the curvature exceeds a cer-
tain amount, the relative advantage should
shift to quadratic prediction.
The determination of the minimum value of
target path curvature at which quadratic pre-
diction becomes relatively advantageous de-
pends not only upon:
1. dispersion of the predicted point of im-
pact due to tracking errors,
but also upon a number of i
which are :
2. actual future position of target with
respect to the predicted point of impact, assum-
ing an accurate computer and the absence of all
sources of dispersion enumerated here ;e
3. dispersion due to inaccuracies in the com-
puter and data-transmission systems ;
4. dispersion due to noise in the computer
and data-transmission systems ;
5. dispersion due to variations in actual dead
time;
6. dispersion due to gun wear and to varia-
tions in powder charge, shell weight, shell
shape, etc.;
■J*
0
s
POWER SPECTRUM
or
TRACRM8 ERRORS
MARK VII ROMS AS A 14
s i
it.
e
m
' i
i
1 1 r
-
" 1
1 it 1
* " — fi — =ft — it
Figure 9. Composite power spectrum of tracking*
errors of experimental radar.
7. dispersion due to variations in meteoro-
logical conditions along the path of the shell ;
8. dispersion due to variability of time-fuze
calibration ; and
9. lethal pattern of shell burst.
In a special illustrative case, a numerical
analysis, including most of these factors (esti-
mated), showed that quadratic prediction be-
comes relatively advantageous when the target
acceleration exceeds about O.lg. However, this
should not be taken as a general result.
o This is considered in detail in the next section.
CONFIDENTIAL
130
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
1,1 * Linear and Quadratic Prediction
Errors on Constant-Velocity
Circular Courses
The use of a finite number of derivatives of
the tracking data for purposes of prediction is
itself a source of prediction errors even if there
were no tracking errors. Definite evaluation of
these prediction errors can be made only if the
path of the target is prescribed. The simplest
path which can be prescribed for this purpose
is a circular one at constant velocity. Such a
path is fairly realistic when considered in rela-
tion to the difficulty of maneuvering a bomber
and to actual records of the paths of hostile
bombers over London during World War II.
The position of a target flying in a circle at
constant velocity, referred to the center of the
circle, is expressed by the complex quantity
Re** where R is the radius of the circle and «
is the angular rate. In terms of the velocity V
and the transverse acceleration A, we have
R = V*/A w = A/V. The predicted position is
then at JtT(i»)e'-' where Y(u.) is the trans-
mission function of the prediction circuit. The
true future position of the target, however, is
at R exp [i«>(t + t,) ]. Hence, the prediction
error, referred to axes fixed on the target and
oriented respectively transverse to and in the
direction of the present velocity, is
« ~ RlY(iu) - e"r] .
As an illustration let us consider a case in
which V = 150 yd per sec, A = 5 yd per sec1 and
tf = 10. For the linear prediction circuit
Yrffo) - 1.0409 + /0.3296
and for the quadratic prediction circuit
r,(»«) - 0.9501 + t0.3610
while
- 0.9450 + t0.3272 .
Hence, when the present position of the target
is at 4500 + t'O with respect to the center of the
circle, the linear predicted point is at 4684 +
tl483, the quadratic predicted point is at
4276 -I- t'1624 while the true future position is
at 4252 + t'1472. These are shown in Figure 10.
The prediction error vectors are
«, = 432 + /ll jt|; = 432
«t = 24 + f 152 |«v = 154
Referring to Figure 10 it may be observed
that if the first-derivative component of the
prediction were to be reduced by approximately
10 per cent a nearly perfect hit would be ob-
tained. This suggests the possibility of deter-
2000 -
<
QUA0RAT IC PREDICTED
, POSITION
SECOND DERIVATIVE
TRUE FUTURE
POSITION — ^
(10 SEC) ^
LC Ml TUTOR
— tv LINEAR
^PREDICTED
I
!
i
1
jf
POSIT BN
1
-»
•
NATIVE
TOR
woo -
•
1 FIRST Kl
LlEAO VE<
1 —
•
•
i
I
4M0 m TO
9 CCMTC* Or TURK
1 PRESENT POSrTMM
Figure 10. Vector diagram of linear and quadratic
prediction for constant-velocity circular courses.
mining empirical functions of the time of flight
for the potentiometer factors G, and G, in
order to improve the probability of kill. This
would involve consideration of all of the
sources of dispersion enumerated in the preced-
ing section as well as a statistical study of tar-
get paths. Such a determination has not been
attempted.
it. i s Physical Configuration of the
Second-Derivative Circuit
In this section we shall derive a physical con-
figuration for the second-derivative circuit. In
particular it illustrates the application of feed-
back to the realization of weighting functions
or impulsive admittances involving complex
exponentials in general." It should be pointed
out, however, that the application of feedback
to the end in view is not restricted to purely
0 Originally proposed by R. L. Dietzold.
CONFIDENTIAL
CIRCUIT FOR CLOSE SUPPORT PLOTTING BOARD
electronic circuits. An application involving
the use of servomechanisms will be described
in Section 13.4.
The transmission function which concerns us
here may be expressed in the partially factored
form
Y(P) =
((> + 0.2087) i/> + l..)S04)(/;- + 0.3U4<»p + O.OttOli)
where the |>oles have been adjusted to cor-
respond to T = 20 seconds and where a constant
factor has been left out.
The circuit is to be designed to work out of
the amplifier in the first-derivative circuit of
the M9 director. Since this much of the first-
derivative circuit has a transmission function
of the form p (p-t-0.24), the transmission
function which we have to realize is Y ,(p) /
Y,(l>) where
and
P f 0.20S7' ip + i..W»4i
Y,ip)
U.MWp + IMKttWi
p + 0.24
The inversion of the factor corresponding to
Y,(p) is in accordance with the fact that the
transmission gain through a feedback amplifier
is equal to the loss in the feedback network,
provided the feedback is very large. To realize
the transmission function Y,(p) /Y,(p) it is
therefore necessary only to realize the trans-
it
SMOOTHING
NET WORK
1 — 1| — WVW^WV-
»,C,= J.IM
Ci = o.ai?oc,
R,C, = J. 604
R,= 0.07UI R,
= iz.n
T-O-T
R,/2
Figure 11. Physical configuration of quadratic
prediction circuit for modified M9 AA director.
mission functions Y{(p) and Y,(p) individu-
ally. The corresponding networks are shown in
Figure 11, with typical element values.
The input network has four elements,
whereas Y, (p) has only two parameters. Hence
there are two degrees of freedom in the element
values of this network. One degree of freedom
must be reserved for the impedance level; the
other permits some latitude in the relative
values of the resistances and stiffnesses.
The feedback network has four independent
elements, whereas Y,(p) has three parameters.
Hence there is only one degree of freedom in
the element values of this network. This degree
of freedom must be reserved for the impedance
level.
There is, however, one degree of freedom be-
tween the impedance levels of the two net-
works. This follows from the fact that the
transmission function of the circuit is the ratio
of the transmission functions of the individual
networks. The scale factor for the transmission
function of the circuit is readily determined
from the fact that the transmission function
must be approximately pRt,C„ at small values
of p.
13.2
CIRCUIT FOR CLOSE SUPPORT
PLOTTING BOARD
In this application, position data smoothing
with delay correction for constant rates of
change in position was required. Assuming flat
random noise in position data, and, arbitrarily,
1-second smoothing time, the best transmission
function for position data smoothing without
delay correction is yu(v) in the notation of
Section 11.3. The best transmission function
for the first-derivative circuit, if it were re-
quired, is pyx (p) . Hence, the best transmission
function for position data smoothing with full
delay correction is
= »o(p) + g P*l(p) •
This corresponds to the weighting function
Wi(t) = 14,(0
= 2(2-3/) 0 < / < 1 .
The series expansion for Y,(p) is, by (15)
of Chapter 11,
P4
Yi(p)
PJ + £ _ JL- +
12 T 30 120 T
CONFIDENTIAL
132
ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS
The form of the rational approximation was
chosen as
' W 1 .+ blP + 62pl + b,p*
in order to obtain a loss characteristic which
has an ultimate slope of 12 db per octave.* This
requirement was also set as a precaution
against noise due to granularity of the coordi-
nate-conversion potentiometers. The coefficients
are determined by
13.3
i
fci = ai
-n> = °
+ ™
30
6,
-V2b> + 3ofel - lib = °
whence
Y(p) =
1 + Hf + If' +
1440
This may be expressed in the form Y(p)
YAp)/Y,(p) where
1
7<(p) = 1 -(- 0.1053p
„ , , 1 + 0.3530p + 0.0461 5p'
w) - —
The circuit
Figure 12.
1 + 0.4583p
ion is shown below in
R./2 "•/*
-VWWAVW
=!=C,
R,CV0.4?I3
R,C, =0.1007
R, = 0J06IR,
8,^=0.8241
Figure 12. Physical configuration of data-smooth-
ing circuit for close support plotting board.
• This design also antedated the formulation of the
n — m = r + 1 rule given in Section 12.2 according to
which we should have taken Yi(p) « y,(p) + % pyAp) ■
CIRCUIT FOR GROUND-CONTROL
BOMBING COMPUTER
In this application, rate smoothing as well as
position smoothing was required. In addition,
delay correction in position, for constant rate
of change, was to be available but optional, and
the loss characteristic was to have an ultimate
slope of 12 db per octave, or more.
In accordance with the n — m = r + 1 rule,
the best transmission function for position data
is y1 (p) , whereas that for rate is pi/: (p) . A num-
ber of designs were made on this basis. How-
ever, from the point of view of network econ-
omy they were inferior to a design based on
j/2(p) for position data. The use of 2/2(p) for
position data is not consistent, theoretically,
with the use of pi/2(p) for rate, but the practi-
cal advantage outweighs the theoretical disad-
vantage.
The rational approximation used for i/,(p)
4r
MR, 0JR,
l— WW-r^VWV— 1
r
CJR,
r *.
HI-
R,C, = 0.4431
r,c, «ai*M
R,C, -0.S000
R,C. * HUM
R,Ct « 0.13*0
ALTtBNATIVCLV
(FOR DELAY CORRECTION)
0.2153 (FOR FIRST DERIVATIVE)
0.2 i5J (FOR DELAY CORRECTION)
Figure 13. Physical configuration of linear pre-
diction circuit for ground-control bombing com-
puter.
is the one given in (6), Section 12.2. It may
be expressed as
where
YAP)
Y,(P)
Y»(p)
1
1 + 0.2153p
1 + 0.2847p + 0.03870p»
1 + 0.135<Jp
1
1 + 0.135*)p
CONFIDENTIAL
CIRCUIT
I SING SERVOMECIIANISMS
133
It may be noted that a redundant factor has
been introduced, viz., 1 + 0.1359p, in order to
secure a physically realizable Y,(v) . The coeffi-
cient was chosen so that a resistance would not
be required in the shunt branch of the feedback
network. Referring to tin- circuit configura-
tion in Figure 13, the transmission function of
the input network is Y,s(p), that of the feed-
back network is Y,(p), and that of the output
network at the top is Y, ,(p) .
The output impedance of the amplifier is re-
duced nearly to zero by virtue of shunt feed-
back.1"^ Hence, the rate circuit, as shown in
Figure 13, may be derived from the amplifier
output through a simple additional network
whose transmission function is pY,,(p)- Two
rate outputs are provided so that the delay
introduced in position may be corrected option-
ally without disturbing scale factors.
CIRCUIT USING SERVOMECHAN1SMS
In the final report, October 25, 1945, to
NDRC Division 7, on the research program car-
ried on under Contract NDCrc-178, a list is
given of a number of the more important prac-
tical advantages for the use of a-c carrier in
computing circuits. These advantages are:
1. Permits operation at lower levels before
running into trouble with thermal noise, contact
potentials, drifts due to temperature;
2. Permits use of transformers for imped-
ance matching, voltage transformations, cou-
pling between balanced and unbalanced circuits ;
3. Permits use of hybrid coils for voltage
summations of moderate precision ;
4. Eliminates the necessity for modulators in
servo circuits using a-c motors ;
5. Permits reduction in total power consump-
tion, rectified power for amplifiers, and voltage
regulation.
However, the techniques of differentiation
and of data smoothing with fixed networks in
computing circuits which use d-c carrier, are
not applicable to computing circuits which use
a-c carrier.
The circuit described here is an example of
one of the techniques used in the T15-E1 experi-
mental curved flight director.' In Figure 14
servo motors' are indicated by A/, and genera-
' The technique of using servo motor* for smoothing,
as described above, is due chiefly to h L. Norton.
tors by G. The motors are two-phase induction
motors with one phase winding of each ener-
gized directly by the carrier source at constant
amplitude. The generators are essentially two-
phase induction motors also with one phase
winding of each energized directly by the carrier
source at constant amplitude. They deliver, at
Figure
circuit.
14. Electromechanical linear prediction
the other phase windings, carrier voltage at
amplitudes proportional to the angular velocities
0, and 0, of the shafts. The potentiometers are
energized by the carrier source at constant am-
plitude. They deliver carrier voltage at ampli-
tudes proportional to the angular positions 0,
and 6.2 of the shafts from some reference posi-
tions. The position data are represented by the
modulation amplitude E.
With amplifiers of sufficiently large voltage
gain and power capacity, and motors of suffi-
ciently large torque, the operational equations
of the circuit are readily found by equating to
zero the sum of the voltages applied to each
amplifier. Thus
0i + (a, + 0p)0, = E
p0i - (1 + a2p)0, = 0
whence
0i =
u2 =
1 + atp
l + + a„)p -(- 0pJ
E
1 -Mat + «s)p + /3pJ
The angular position 0l therefore represents
the smoothed position data while the angular
position 62 represents the smoothed rate.
CONFIDENTIAL
Chapter 14
VARIABLE AND NONLINEAR CIRCUITS
The past discussion has been more or less
clearly directed at predictor systems hav-
ing certain well-defined properties. For ex-
ample, it has been tacitly assumed that the first
part of the prediction system will consist of
geometrical manipulations transforming the
raw input data into other quantities, such as
the components of velocity in Cartesian or in-
trinsic coordinates, which we have some physi-
cal reason to believe should be approximately
constant for extended periods." These quanti-
ties, then, are isolated explicitly in the circuit
and are the actual effective inputs of the data-
smoothing networks. The data-smoothing net-
works themselves are, of course, definitely
assumed to be linear and invariable.
This is obviously a straightforward attack
but it does not necessarily exhaust all possibili-
ties. For example, advantages may be gained
by using data-smoothing networks which are
nonlinear or which vary with time or target
position. It may also be possible to smooth the
input data according to some geometric as-
sumption, such as straight line flight, without
the necessity of isolating geometrical parame-
ters explicitly.
This chapter attempts to illustrate these pos-
sibilities by some rather scattered examples.
Data-smoothing networks which vary with time
seem to give improved performance over fixed
networks, and have been studied with some
care. Several examples are given at the end of
the chapter. None of the other lines, however,
has been explored at all thoroughly. The ex-
amples of data-smoothing networks variable
with time are, in a sense, illustrations of non-
linearity also, since they all operate on the
assumption that the cycle of the network's
variation with time begins anew at each
marked change in course. Since a change in
course is exactly like a tracking error, except
that it is much larger, this resetting requires
a nonlinear control circuit which respond
to large amplitude effects but not to"small ones.
1 This is true ideally even in the Wiener system since
Wiener assumes that transformations will be made to
some suitable coordinate system, preferably the intrin-
sic, before the statistical prediction method is applied.
This, however, is evidently a very mild sort of
nonlinearity. More thoroughgoing nonlineari-
ties have not been studied. There seems to be
no a priori reason for supposing that they
would appreciably improve the performance
of data-smoothing networks.
The first part of the chapter gives examples
of data-smoothing schemes which do not re-
quire the isolation of geometrical parameters.
They are based on degenerative feedback cir-
cuits which satisfy the requisite formal rela-
tions but which might, in some cases, be un-
stable in practice. This portion of the material
is included primarily for its possible sugges-
tive value rather than for its concrete practical
usefulness.
>*•' THE PROTOTYPE FEEDBACK
CIRCUIT
The diversity of particular circuits can be
givon a certain unity by regarding them all as
modifications of the feedback smoothing cir-
cuit shown originally in Figure 2 of Chapter
10. In accordance with the discussion of that
figure it will be convenient to suppose that the
resistive feedback path is introduced to limit
the gain of the amplifier proper, so that the
structure reduces to an amplifier with high but
finite gain and a pure capacity feedback. The
circuit has a net loop gain, and is consequently
degenerative, at any moderately high frequency.
For our present purposes, it is convenient to
recall the general property of degenerative
feedback amplifiers, that they tend to suppress
any given frequency by the amount of the de-
generative feedback for that frequency. This
suppression obtains not only at the amplifier
output but at many other points in the circuit
as well. For example, it holds at the amplifier
input if we combine the original applied volt-
age with the voltage contributed by the feed-
back1- circuit1** Thus, except for the absolute
b This follows immediately from the fact that, since
the characteristics of the amplifier proper are not
changed by the addition of the feedback path, the
output voltage is always a fixed multiple of the net
input voltage.
134
CONFIDENTIAL
SIMULTANEOUS SMOOTHING IN THREE COORDINATES
135
signal level, it is not necessary to transmit
through the amplifier of Figure 2 of Chapter
10 in order to produce the smoothing effect. It
would be sufficient to hang the input circuit of
the amplifier, as a two-terminal impedance,
across the circuit.
142 SIMULTANEOUS SMOOTHING IN
THREE COORDINATES
The property of degenerative feedback cir-
cuits which has just been described is con-
veniently illustrated by a three-dimensional ex-
tension of the original smoothing circuit of
Figure 2 cf Chapter 10. The three-dimensional
circuit is shown in Figure 1. The three input
voltages are the quantities D, DE, and DA cos
i 'WW I
20k win
R
r W\rt
Vj-DE
V,«DAm»E
COORDINATE
1 COORDINATE
CONVERTER
CONVERTER
1 m '
MODULATORS
f m • • mm m mm^
:demodulators:
.....
I
r
Figure 1. Feedback smoothing in three coordinates
E, where D, E, and A are, respectively, slant
range, elevation, and azimuth. The three volt-
ages will be recognized as the three components
of the target motion in a tilted and rotating
rectangular coordinate system. One axis of the
tilted system is directed along the instan-
taneous line of sight to the target and the other
two are perpendicular to this one in the ver-
tical and horizontal planes respectively.0 It is
assumed that these input rates represent target
motion in a straight line, plus the usual track-
ing errors. The object of the smoothing system
is to provide shunt impedances which will tend
to suppress the tracking errors by feedback
action, according to the principles described in
the preceding section, without disturbing the
portions of the input voltages corresponding to
the assumed straight line path.
We can simplify the analysis by restricting
our attention to the special case of two-dimen-
sional motion which occurs when the target
course lies in a vertical plane passing directly
through the antiaircraft position. This is illus-
trated in Figure 2. In this case the component
DA cos E is evidently zero. If we represent
the voltage at the other two terminals, includ-
ing both the original applied voltages and the
voltages fed back through the circuit, by V, and
Vv the voltages coming out of the coordinate
converter on the right-hand side in Figure 2
are
v, « Vi cos E -Vt sin E
vw - Vt cos E + Vx sin E
(1)
These voltages are differentiated, passed
through a second coordinate converter, and fed
back so that the output voltages must satisfy
(2)
Vi = D — cos E + it sin E)
V, = DE - cos E - v, sin E) .
In order to exhibit the smoothing action of
the circuit let us denote the observed velocity
components, referred to the upright and fixed
0 This is the coordinate system which was used in the
experimental T15 director. A complete prediction cir-
cuit can be obtained by using- the three voltages de-
scribed here as inputs to the lead servos in the TIB
system. In the actual T16 system, rates in the tilted
and rotating coordinate system were obtained by the
so-called "memory point" method. The voltages D, DE,
-etc., required with the present method, might be ob-
tained with the help of tachometers attached to the
tracking shafts to measure the instantaneous values of
D, E, and A. An equivalent to the variable smoothing
of the memory point method can be obtained by *«»n«f
the gains in the feedback paths in Figure 1 variable
according to the principles described in a later
CONFIDENTIAL
136
VARIABLE AND NONLINEAR CIRCUITS
rectangular coordinate system, by ut and uw,
so that
ut = D cos E - DE sin E
u„ = DE cos E + D sin E .
Substituting (2) and (3) into (1), we get
(3)
Vy
Uy — fiVy
or
Ml'* + =
HVy + Vy = Uy .
These show clearly that vx and v„ are smoothed
values of u„ and uy, respectively. If n is constant
the smoothing is of fixed exponential type. If ^
is proportional to the time up to some maxi-
mum value, the smoothing is of the variable
type described in Sections 14.6 and 14.7.
To complete the discussion of the circuit we
observe that by (1)
Vi — rx cos E + vy sin E
Vt = Vy cos E — r« sin E .
These show that Vx and V, are the smoothed
rate components referred to the tilted and
rotating rectangular coordinate system. The
fact that the orientation of this coordinate sys-
tem, which depends upon the observed angular
height E, is not smoothed makes no difference
to the computation of the leads because this
computation is made instantaneously in the
same coordinate system to which the smoothed
rate components are instantaneously referred.
The analysis in the general case including
all three coordinates is of the same nature.
Since the rate components in fixed rectangular
coordinates appear in the middle of the feed-
back path, it is perhaps not fair to regard the
circuit as an illustration of a data-smoothing
device which does not rely upon the explicit
isolation of the geometrical parameters of the
assumed target path. It should be pointed out,
however, that in comparison with a straight-
forward geometrical solution in which velocity
components in fixed coordinates are first isolated
explicity, then smoothed, and then used to form
the basis of prediction, the circuit in Figure 1
has the advantage that most of the components
can be built with very low precision. What is
transmitted around the feedback loop is essen-
tially the tracking errors only. Since tracking
errors are always small, very high percentage
errors in the system can be tolerated.*
COO
CON
RDINATE
VERTER
J-l
! MODULATORS ',
c J
COORDINATE
I CONVERTER
'DEMODULATORS!
■Ir
Figure 2. Feedback smoothing in two coordinates.
SMOOTHING NETWORKS VARIABLE
WITH TARGET POSITION
It was mentioned earlier that changing the
data-smoothing network with the target coor-
dinates represented one way in which the re-
sults obtained from fixed networks could be
d An exception to this statement must be made for
errors in the coordinate converters which fluctuate
rapidly with target position.
CONFIDENTIAL
SMOOTHING NETWORKS VARIABLE WITH TARGET POSITION
137
generalized. In a sense, the coordinate conver-
sions of Figure 1 are illustrations of these
possibilities. A better illustration, howe.dr, is
provided by the circuit of Figure 3. Thv struc-
Figure 3 Feedback smoothing with smoothing
variable v. ; h pv^iioti coordinates.
ture is intends to give smooth slant range
rate from slant range lata, under the assump-
tion of unacceierated straight line target
motion.
The relation between input and output in
Figure 3 is readily seen to be •
'"at" -4 '»'•>]
or
M^(/)IJ + 1=^ (4)
where ^ is the amplifier gain, D is slant range,
and V = dD/dt is slant range rate.
The principle of the circuit depends upon the
fact that under the assumed target motion the
square of the slant range, D2, should be a
quadratic function of time, so that [D (dD/dt)]
should be a linear function of time and (d/dt)
[D (dD/dt)] should be a constant. This last is
the quantity which is fed back in Figure 3.
If it actually is a constant, it has no further
influence on the calculation, since the forward
circuit includes a differentiator, and the opera-
tion of the circuit is the same as though no
feedback term were present. This can be verified
by setting D = D0 = \/a + 2bt + ct\ corre-
sponding to ideal straight line flight, in equa-
tion (4). It is readily seen that the equation is
satisfied by
ft + <* dl)0
V = To =
Va + 2bl -r Ct*
(It
the first or feedback term being zero.
If D does not correspond exactly to straight
line Alight, either because of tracking errors
or actual target maneuvers, on the other hand,
the feedback voltage is no longer constant. In
this case transmission around the loop can
exist and the degenerative feedback action
produces smoothing in both the input and the
output voltage. In calculating the exact effect
we must take account of the fact that the feed-
back voltage depends upon the D potentiometer
in the feedback circuit as well as upon the out-
put voltage V. Since the D potentiometer set-
ting must include the errors in the input data,
this means that the output voltage is not per-
fectly smoothed, even with unlimited gain
around the loop. The percentage error in the
output rate tends in the limit to approximate
the percentage error in D itself. For practical
purposes, however, this is a very satisfactory
result, since in the absence of smoothing per-
centage errors in rates are usually many times
those of the corresponding coordinates.
It is apparent that it should be possible to
construct many circuits of this general type
from the differential equations of the trajec-
tory. A second example is furnished by Figure
4. The operation of the circuit is essentially
• • DAcosE
_
•The condensers in Figure 3 symbolize differentia-
tion.
Figure 4. Another example of feedback smooth-
ing with smoothing variable with position coordi-
nates.
similar to that of Figure 3. It depends upon
the fact that in unaccelerated straight line
motion the quantity D2A cos2 £ is a constant.
Instead of multiplying by D2 and cos2 £ at a
single point in the feedback loop, however,
separate multiplications by D and cos E are
introduced in the forward and feedback cir-
cuits. This permits the output to appear as a
smoothed value of the quantity DA cos E,
CONFIDENTIAL
138
which will be recalled as one of the primary
quantities in the circuit of Figure 1.
14-« NETWORKS VARIABLE WITH TIME
In addition to making the parameters of the
data-smoothing network vary as functions of
the coordinates of target position we may also
make them variable as functions of time. The
advantage of variation with time can be under-
stood by going back to the discussion of the
analytic arc assumption and its consequences
for fixed data-smoothing networks, as given in
Chapters 9, 10, and 11. It will be recalled that
for any given settling time there was an opti-
mum choice of the network's weighting func-
tion. The choice of the settling time itself, how-
ever, was always a compromise. On the one
hand, making the settling time too short led
to too little smoothing, so that the dispersion
in the resulting fire became excessive. On the
other hand, too long a settling time meant that
data from previous unrelated segments were
retained in the smoothing circuit during too
large a proportion of an average individual seg-
ment of the target path, leaving too small a
residue of the average segment as useful firing
time.
It is evident that it is theoretically possible
to escape the consequences of this compromise
by resorting to variable structures. We need
merely assume that the network always has a
weighting function appropriate for a settling
time equal to the time since the last change in
course. This would give a small amount of
smoothing shortly after a change in course,
with more smoothing and consequently greater
accuracy later on. No firing time, however, is
sacrificed waiting for the network to settle.
In order to exploit these possibilities we
must, of course, be able to design networks to
give at least approximately the right sequence
of weighting function. It is also necessary to
provide some sort of auxiliary controlling
mechanism which will sense changes in target
course and return the variable circuits in the
smoothing network proper to their initial posi-
tions. These are both difficult problems which
.iave been incompletely explored. Some elemen-
tary solutions, based principally upon modifica-
tions of the degenerative feedback smoothing
circuit of Figure 2, of Chapter 10, are, how-
ever, given later in the chapter. As a prelimi-
nary, the next section gives a formal extension
of the general polynomial expansion method of
Chapter 11 to the variable case.
»*s GENERAL POLYNOMIAL SOLUTION
FOR VARIABLE NETWORKS
The extension of the general method of
Chapter 11 to the variable case requires two
modifications.
1. The lower limit of the integral to be
minimized is now taken as zero, in anticipation
of the possibility of discriminating between rele-
vant and irrelevant data on the basis of time of
arrival.
2. The weighting function may now depend
more generally upon the variable of integration
and the upper limit of integration.
With these modifications there is no longer
any advantage, in conducting the analysis in
terms of the age variable t. To deal directly
with the minimization of the integral
jf \E(\) - ig(X)}« B'o(/,X) rfX , (5)
let
E(\) = Vo + Vi- G,«,X) + • • • + Vm • Gn(t,\), (6)
Where Gm(t,k) is an mth degree polynomial in
A. Also, let
£ w0(t,\) d\ = i
jf G,(/,X) ■ Gm(t,\) ■ W0(t,\) d\ = 0 if I * m
" T. in = m
(Go = 1, Ar0 = 1) .
Then (5) is a minimum with respect to the
Vm's in (6) if
Vm(t) =J^lE(\)-Wm(t,\)d\ (7)
where
Wm(i,\) = kmGm(t,\) • W0(t,\) . (8)
The possibility of physically realizing the
Vm(t) depends upon the possibility of realizing
networks with impulsive admittances Wm(t^)
in the sense that Wm{t,k) is the response of a
CONFIDENTIAL
NETWORKS WITH A LIMITED RANGE OF VARIATION
139
network, at time t, to a unit impulse applied at
time A, where 0 < A < t. Taking this possibility
for granted, the predicted value E(t + t,) is,
according to (6), a variable linear
of the Vm{t), viz.,
Kit + t/)
(9)
Wit) + d(M + ii) ■ Vv(i) + ■
+ Gn(t,t + y • v.(t).
It is clear that all of the Wm(t,\) as well as
all of the Gm(t,\) for m = 1, 2, . . . are deter-
mined by W0(t,\). The latter is determined as
the best weighting function for position data
smoothing, depending upon the characteristics
of the noise associated with the position data.
The general methods of determining the best
weighting function with fixed smoothing time,
described in Chapter 10, may be used to deter-
mine the best weighting function with variable
smoothing time.
Under the assumption that the spectrum of
the noise associated with the signal 5(0 has a
uniform slope of 6k do per octave, we may take
over from Section 11.3 the result that the best
weighting function is
-«-JW![i(l<-W (,0)
0 £ X £ I .
The response of the network is then
£
S(X) • wk{t,\) rfX
(ID
SPECIAL
It will be illuminating to consider a few
special cases of (11).
For k = 0, we have
V(D = | jfs(X)dX.
(12)
Multiplying through by t and differentiating
we get
tV(t) + V(t) = 5(0 . (13)
This suggests the circuit shown in Figure 5.f
For k = 1, we have
V(t)
t* Jo
S(X) • \(t - X) rfX .
Multiplying through by t3 and differentiating
twice we get
Irv + IV + V = S
which may be written in the form
This suggests the network shown in Figure 6.«
14.7
NETWORKS WITH A LIMITED
RANGE OF VARIATION
By generalizing the above results in various
ways a large number of other examples of
variable smoothing networks can be constructed.
Since unlimited variation in the smoothing
time is not practically possible, or perhaps even
tactically optimal, however, it is desirable in
discussing any further examples to include also
the possibility that the range of variation in
the network may be restricted. For any posi-
tive integral value of k in (11) the differential
equation for V(t) is of the type which may be
reduced by the transformation t = e* to a linear
differential equation with constant coefficients.11
In general, this facilitates the determination of
what happens to the weighting function
wk(t,A) when t > T if the variability of the
network is stopped at time T. In the case of the
first-order equation (13), however, it is just
as easy to deal directly in terms of the natural
time.
A more general form for (13), which readily
yields the effects of a sudden or gradual stop-
page of the variability of the network, is
«(0
V(t) + V(t) = 5(0
(14)
This corresponds to the response
whence the weighting function is
w(t,\) =
»(X)
*(0
(15)
' This circuit is due to S. Darlington.
« Due to B. T. Weber.
"See Section A.ll for a more, general transforma-
tion.
CONFIDENTIAL
140
VARIABLE AND NONLINEAR CIRCUITS
The general relation (14) may be realized
with the network of Figure 5, by varying the
resistance in accordance with
R m 1<K0
t > 0 .
However, a more practical circuit results from
the introduction of variable potentiometers' in
both the capacity and resistance paths of the
C=4= V(t)
Figure 5. Time-variable smoothing circuit giving
uniform weighting function.
original feedback smoothing circuit of Figure
2, Chapter 10. This is shown in Figure 7.' It
may be noted that the feedback circuit is also
applicable to the two cases discussed in the
preceding section. It has the advantage for
these applications that it does not require the
zero-impedance generators and infinite-imped-
ance loads of Figures 5 and 6.
This example obviously calls for a linear poten-
tiometer in the condenser path and a switch in
the resistance path. The weighting function ob-
tained is, by (15),
u>(*,"X) - - 0 < \ < t < T
j, e-^/r o < X < T < t
1 e-«-wr 0 < T < X < t
Figure 7. Limited range time-variable feedback
smoothing circuit.
S(1)A C,
D ,J_
C,=J= V(t)
I
Figure €. Time-variable smoothing circuit giv-
ing parabolic weighting function.
As an example of (14) we may take
*(0 = t 0 < t < T
= re"-™ t > T .
Then
J(0 =/ 0<t<T
= T t > T .
Hence, in Figure 7, if RC = T
fc(t) = j, fa(t) =0 0 <t < T
= 1 = 1 t > T .
1 In aome cases a variable potentiometer may turn
out to be a switch.
J This circuit is due to S. Darlington.
This is illustrated in Figure 8 for T= 10, t = 5,
10, 20.
0.2
0.1
t = 5
t = IO
T=I0
t=20
10 15 20
Figure 8. First example of weighting function
produced by circuit of Figure 7.
A second example is furnished by taking
<t>(t) = ik 0 < t < T
= 7*e*«-T>/T t > T .
Then
ko
k 0 < 1 < T
T
CONFIDENTIAL
OTHER EXAMPLES
141
Hence in Figure 7, if RC T k.
The weighting function obtained is, by (15),
frit) = T fud) = 1 lk (i < i . T
= 1
1 i > T
wCt,\) =
2T
The first example is a special case of this one.
The weighting function obtained is, by (15),
AX*-1
u»(/,x) = — -j— o < x < / < r
■ c -*('-r)/r o < x < t < /
= ^ e -*('-M/r o < T < X < / .
This is illustrated in Figure 9 for k - 3/2,
71 - 10, t = 5, 10, 20.
0 < X < * < 7
271
2
7 xV e"2l'"T) T 0 < x < T < 1
V ~2f)
e-2(i-y)/T 0 < T < \ < t .
This is illustrated in Figure 10 for T = 10,
t = 5, 10, 20.
k = i T=I0
Figure 9. Second example of weighting function
produced by circuit of Figure 7.
A third example is furnished by taking
2-1
0 < / < T
TV *«-T) r , > 7'
Figure 10. Third example of weighting function
produced by circuit of Figure 7.
A fourth example is furnished by taking
4><t) - c* - 1 < > 0 .
Then
l
57, i>o.
Hence, in Figure 7, if f?C = 1/k,
fc(t) = /*(0 = 1 - e~kt t> 0 .
The weighting function obtained is, by (15),
k
Then
w(t,\) =
1 - e
-kl
e-*d-x) o < X < t
<t>a) \ 2/7
For any value of t this weighting function is
exponential in x.
T
14.8
OTHER EXAMPLES
Hence, in Figure 7, if RC - 7/2,
/r(fl = |(l ^) /*(» = -,{. 0 < / < T
= 1 = 1 / > T .
CONFIDENTIAL
Because there has been no demand for varia-
ble networks in the field of communications,
the technique of designing practical variable
networks is in a very rudimentary stage com-
pared to that of designing fixed networks. In
the remainder of this chapter we shall describe
VARIABLE AND NONLINEAR CIRCUITS
U2
some of the circuits which have been developed
for specific practical applications.
A memory point method of obtaining
smoothed rates, based upon (12), is illustrated
below. If S(t), the quantity to be smoothed,
lepresents the time derivative E(t) of the posi-
tion data E(t), then the average rate is given
by
Coder the assumption that the position data,
aside from tracking errors, is a linear function
of time, the average rate is also the smoothed
rate. If the position data is represented by the
angular displacement of a shaft in the com-
puter, the quantity £"(0) is readily fixed by
providing a second shaft which is coupled to
the first shaft until t - 0 when the coupling is
broken. Potentiometers mounted on the shafts
are energized by a voltage varying as a func-
tion of time in the manner indicated in Figure
11. The manner in which the smoothed rate is
obtained is clear
Fibi'iit 11. Memory point method of obtaining
smoothed rate.
The memory point method of obtaining
iuothed rates is used in the T15 antiaircraft
director.4 In this application, however, it is
somewhat more complicated than in the simple
illustration described above. This is due to the
fact that the position data and the memory
point are in the polar coordinate system,
whereas the rate components are referred to
a tilted and rotating rectangular coordinate
system which is determined by the instanta-
neous llllr of sight
Figure 12, shows a way of securing variable
smoothing in a purely electrical circuit * Except
for the fact that the division of the current
through the condensers is varied discontinu-
FiGURE 12. Specific limited range time-variable
feedback smoothing circuit.
ously instead of continuously, this circuit cor-
responds to the first or the second example dis-
cussed in Section 14.7.
Figure 13 shows the variable smoothing cir-
cuit 1 for smoothing first derivatives in the
M9A1-E1 antiaircraft director.8 This circuit
R
Figure IS. Another specific limited range time-
variable feedback smoothing circuit.
corresponds approximately to the second exam-
ple of the differential equation (14) given
above. The variable element is a thermistor
which is heated up to a high temperature, prac-
tically instantaneously, by the heater, and then
k This circuit is due to S. Darlington.
1 Developed by R. F. Wick.
CONFIDENTIAL
OTHER EXAMPLES
143
allowed to cool off naturally. By choosing the
electrical and thermal constants in the circuit
correctly the resulting smoothing can be made
to approximate that obtained in a memory
point circuit.
As noted earlier, all these variable circuits
require some auxiliary control means to reset
the variable circuits to zero whenever a new
target is engaged or the current target makes
a sudden change in course. In the T15 memory
point system this function was performed by an
operator. The operator was aided by a series of
meters which compared the instantaneous
memory point rates with average rates set in
some time previously by hand. The visual in-
dication of a change in course, calling for the
selection of a new memory point, was a rela-
tively large, smoothly and decisively varying
deflection on the meters. In contrast, normal
tracking errors appeared as relatively small
random fluctuations of the needles. The circuits
of Figures 7 and 12, which were intended for
bombsight applications, were also under the
control of an operator, who was supposed to
start the mechanism at the beginning of each
bombing run.
Two control methods were used for the cir-
cuit of Figure 13. In one, large changes in rate,
corresponding to probable changes in target
course, were distinguished by comparing the
instantaneous value of the target rate, as ob-
tained directly from a differentiator, with the
smoothed value obtained at the output of the
smoothing circuit. In the other method, equiva-
lent information was obtained by again differ-
entiating the instantaneous value of the target
rate, making a second derivative of the target
coordinate. In either case this rate difference
or second derivative information was used to
control a gas tube, which went off, supplying
heating current to the variable thermistor,
whenever the voltage applied to it exceeded a
certain threshold. This threshold evidently
marks the minimum change in course for which
the variable network will be reset. In order to
permit the use of a low threshold, without
making the circuit unduly liable to false opera-
tion because of the effect of tracking errors,
the gas tube input voltage was first transmitted
through a low-pass filter which suppressed
most of the energy due to tracking errors. A
considerable amount of work was done on the
proportioning of this filter to provide the best
protection against false operation with a low
threshold and with minimum delay in resetting
in case a change of course actually does occur,
but the problem remains an interesting subject
for research.
APPENDIX A
NETWORK THEORY
THIS APPENDIX GIVES a summary of linear
network theory which is pertinent to the
analysis and design of data-smoothing and
prediction circuits. It is incomplete in many
respects and should therefore be supplemented
by reference to established textbooks on the
subject. However, it contains some results
which are new.
The present summary will be concerned
mainly with fixed linear networks. Variable
linear networks will be considered briefly in
the last section.
A 1 IMPULSIVE ADMITTANCE
A fixed linear transmission network is one in
which the response V(t) is related to the im-
pressed signal E(t) by a linear differential
equation of the form
b'dW+bn-idJiy^ + + M'
dmE dm'lE
with constant coefficients. It is well-known that
the solutions of such a differential equation
obey the "superposition principle." This makes
it possible to formulate the response of the net-
work to any signal, in terms of its response to
certain standard signals.
A convenient standard signal for analytical
purposes is the "unit impulse." It may be re-
garded as the limit of the rectangular pulse
shown in Figure 1 as the duration of the pulse
» i 1
Figure 1. Rectangular puise signal.
is decreased indefinitely while the amplitude is
increased in such a way that the area under
the pulse is always unity. The limiting function
thus denned does not exist in a strict mathe-
matical sense. However, it is very convenient
for analytical purposes, and seldom leads to
difficulties, to proceed as though the limiting
function did exist. An impulse occurring at
t = a is conventionally denoted by the singular
function Su(t — A) where
«o(t) = 0 if r ^ 0
J ha(r)dr =0 if t < 0
si if t> 0
The response of a fixed network to an im-
pulse or any form of signal is independent of
the time at which the signal is applied, provided
it is expressed as a function of the time relative
to the application of the signal. Let W(t) be
the response to the signal &0(t). This is called
the "impulsive admittance" of the network.
Physically, it must be identically zero for nega-
tive values of t. For an impulse applied at t = A
the response will therefore be W(t — A), which
is identically zero for t < A.
A physical signal E(t) such as the one shown
in Figure 2 may be resolved into an infinite
Figure 2. Derivation of superposition theorem.
succession of elementary impulses. The strength
of the typical elementary impulsive component,
such as the one shown in Figure 2 as occurring
at time A, is E(\)d\. Its contribution to the
response at time t is E(\)-W(t — A) dk. Hence
the contribution of all the elementary impulsive
components of the signal, to the response at
time t, is given by the formula"
V{t) = f + E{\) ■ W(t - A)d\ (2)
This is one form of the "superposition theo-
rem" for fixed linear networks.
Before discussing the reasons for the limits
of integration indicated in (2), it will be help-
ful to consider a graphical interpretation other
than the one used in deriving the integral. Let
W(t) be of the form shown in Figure 3, and let
^(A) be of the form shown in Figure 4. To
determine the response V(t) at a given value
of t, the curve in Figure 3 is turned over from
CONFIDENTIAL
145
146
APPENDIX A
right to left and placed over the curve in Fig-
ure 4 so that its right-hand edge is at A - t. The
product of the two curves gives a third curve
(not shown), which is identically zero for all
. > t. The area under the third curve is the re-
I — L-W(t)
FlGl'RE 3. An
impulsive admittance
sponse V(t) at the given value of t. For pro-
gressively larger values of t, the curve repre-
senting W(t — a) in Figure 4 is simply slid to
the right with respect to the curve represent-
ing E (a) .
LOO
-i C I 1 ? 3
f'ieu* 4. Graphical iiiterpif iaUon
turn theoiem
ismee a physical signal must certainly be
identically zero up to some definite time, or
since it must certainly have been applied to the
network at some definite time, that time could
be taken arbitrarily as Zero and (2) could be
written in the form
V® = f
Jo
Elk)
In this form, however, since
A!rfA
(3)
jo
is in general a function of t, the response cou.d
not Oe interpreted as a weighted average of the
signal. On the other hand, since
j ^ H',/ - Ax/A = jT W\r)d7
is independent of t, the response may be inter-
preted as a weighted average of the signal, if
•/, - 1
1 h:
as
-ce.->sity of taking tiie lower limit in f2i
j in order t" permit the interpretation
of the response as a weighted average of the
signal, is also expressed by the pi»iu1 of view
that a hxed network cannot make any ,/n/sical
distinction between having no applud signal
and having an applied signal which happens to
be of zero amplitude.
Another shortcoming of the form i'Ai or, for
that matter, of the form (2) if we set t as the
upper limit of integration, comes from the con-
sideration of impulsive admittances of such a
nature that Wit - A) has certain kinds of sin-
gularities at a — t. For example, the case for
direct transmission, expressed in the form
...
VU)
/; >
(A* • S0(t - A),7A
is ambiguous because the singularity in the
integrand occurs exactly at one end of the
range of integration. However, the form
./;'
A I • bn't — Av/A
leads, without ambiguity, to the result
V (t) -- E(f) . This example is not trivia!. Every
network which transmits infinite frequency
must have an impulsive admittance of such a
nature that WU \) contains a singularity of
the I'm n, &,.(' a). Any attempt to rule out such
a singularity on the ground that physical net-
works cannot in fact transmit infinite fre-
quency, complicates the analysis and design of
networks unduly. If a network is capable of,
or is expected to transmit frequencies at the
top of the range of interest or importance, it is
simpler to assume that the network is capable
of, or is expected to transmit all frequencies
above that range.
One other advantage of taking the limit
s of
integration as indicated in (2) may be called
to attention Keeping in mind that /-.'(a) is
identically zero for all values of A below some
definite though perhaps unknown value, and
that Wit ai is identically ,tro for all values
of a t, it is viear that (2) may be integrated
partially any number of times without incur-
ring the burden of carrying a string of iff ins
outside of the integral. Af?«r one pamai inte-
gration we have
where
I'/)
.1 ;/
Sine £ i a, ..< identic. !:> . ],„ ai. ,.,:„,.. 0f
.-. in vM-.n-h Eix) > :ienti«all> zer. ...itd *inee
LONHDL.Ml \1
APPENDIX A 147
A(t - A) is identically zero for all values of
A > t, a second partial integration may be per-
formed with no more formal complication than
the first partial integration. The fact of the
matter is that the terms which ordinarily arise
in partial integrations, outside of the integral,
are here carried under the integral by singulari-
ties of the integrand.
The superposition theorem in the i^rm (4)
may be derived directly in a manner similar to
the derivation of (2). A(t - i) is the response
of the network to a Heav; ..e unit step func-
tion H(t — a) applied at t A, where
H(1 - X) m 0 when t < X
= 1 when t > A .
The signal is resolved into an infinite succes-
sion of elementary step functions of amplitude
E'{k)dk wherever E(k) is continuous, and
finite step functions of amplitude dE(k) wher-
ever £"(a) has a finite discontinuity. The con-
tribution of each elementary step function to the
response at time t is E' (k) A(t — k)dk, that
of each finite step function is A (t - A) • dE(k).
Hence, the response is given formally by (4)
with the understanding that E'(k)dk is to be
interpreted as dE(k) wherever E(k) is discon-
tinuous.*
The response A (t) of the network to a
Heaviside unit step function H(t) applied at
t — 0 is called the "indicial admittance" of the
network. It is more familiar, in the field of
linear transmission theory, than the impulsive
admittance to which it is related by (5), but in
this monograph preference is given to the use
of the impulsive admittance. In the theory of
linear differential equations the impulsive ad-
mittance is known as a Green's function.
It is often convenient to express the response
so that the variable of integration represents
the age of the elementary components of the
signal. Introducing the age variable
r = t- A (0)
into (2), we have
F(0 = £*FAt-T) ■ W(r)dr. (7)
•Formula (4) may be written in the Stieltjes form
V(t)= I A(t-\)aE(\).
Alternatively, we may take the point of view that
E'(A) contains impulsive singularities wherever E(\)
is discontinuous. This point of view is generalized in
Appendix B.
In this form it is clear that the weighting of
signal components is on the basis of age only.
A fixed network may be said to have a memory
which is a function only of the age of past
events.
In the preliminary stages of designing a
smoothing network, the weighting function
W(T) is generally prescribed to be identically
zero when t > T say, as well as when t < 0.
This does not violate the conditions of physical
readability. However, such a weighting func-
tion cannot be obtained exactly with a network
of a finite number of discrete impedance ele-
ments. A finite network invariably yields a
weighting function with a "tail" which extends
to infinity.
*•« TRANSMISSION FUNCTION
Theoretically, the impulsive admittance of a
prescribed network may be determined directly
from the differential equations of the network
in a perfectly straightforward manner. Prac-
tically, however, it is very difficult to do so if
the network has more than two meshes. Fur-
thermore, the technical problem of designing
a network directly from a prescribed impulsive
admittance is even more difficult, particularly
if the impulsive admittance is not exactly re-
alizable.
These difficulties may be avoided by recourse
to the highly developed methods of network
analysis and synthesis used in the field of com-
munication circuits. These methods are based
upon the steady-state properties of networks.
If a signal consisting of the single sinusoid
cos <i>£ is applied to an invariable or fixed
linear transmission network, the steady-state re-
sponse" will also be a single sinusoid of the
same frequency. The amplitude and phase of
the response, relative to the signal, will in
general depend upon the frequency. The re-
sponse may be regarded as the resultant of an
"inphase component" proportional to cos o>£,
and a "quadrature component" proportional to
sin U, with amplitude coefficients which are
functions of the frequency. Furthermore, since
the signal is an even function of the frequency,
the response should also be an even function
of the frequency.0 Hence, the response will
" This is the response apart from transient compo-
nents, assuming that the latter vanish exponentially
with time after the signal is impressed.
c The signal is also an even function of the time but
this is due only to the particular choice of origin which
is arbitrary.
CONFIDENTIAL
148
APPENDIX A
be of the form G(w2) cos wt — wH(w2) sin wt,
where G and H are even real functions of fre-
quency.
By a suitable shift of the origin of time it
follows that if the impressed signal is sin wt,
the steady-state response will be of the form
G(w2) sin^f + o)H(oj') cos wt.
These two results may be combined into a
simpler expression without any loss of indi-
viduality. Since eiu>t - cos wt + i sin wt where
i = \/ — 1, we have
V(t) = '[<?(»*) -(- iuH(u')} ■ if E(l) = e".
A further simplification may be achieved by re-
placing iw by p, and G( - p2) + pH{- p2) by
Y{p), so that
V(f) = Yip) ■ e" if E{t) = e* . (8)
Y (p) is called the "steady-state transmission
function" or just "transmission function" for
short.
Strictly speaking, (8) expresses the relation
of steady-state response to signal only if p = u>.
However, it is customarily called a steady-state
relation even when p is not a pure imaginary
quantity. It may be noted that Y(p) is real
when p is real.
The simplicity of steady-state analysis de-
rives from the fact that time occurs in the
signal and throughout the network only in the
form ept. In particular, the determination of
the transmission function is reduced to the
solution of simultaneous algebraic equations
which do not involve the time factor. For a net-
work in which the signal and the response are
related by the linear differential equation (1)
with constant coefficients, we obtain simply
KV 6o + 6,p + • • ■ + f>„pB '
It may be noted that the poles of the transmis-
sion function, also referred to as "infinite-gain
points" in the p-plane, correspond to the roots
of the characteristic function of the differential
equation. Physical restrictions on the location
of infinite-gain points will be considered in Sec-
tion A.9.
AJ RELATIONSHIP BETWEEN
IMPULSIVE ADMITTANCE AND
TRANSMISSION FUNCTION
A relationship between the impulsive admit-
tance and the transmission function of a net-
work may be obtained from (7). Putting
E(t) = e" when t > 0, we get
V(t) = ePtJ^'w(T^ e'*1 dT
= e"jT W(t) e~* dr
W(t) e-» dr
(9)
The second term in (9) is a transient term due
to the fact that we have taken E{t) ==0 when
t < 0. The first term in (9), which involves the
time only through e"', is the steady-state term.
Comparing this term with (8) we get
Y(p)
W(t) e~" dt
(10)
or, in the notation which will be introduced in
the next section
A.4
Y(p) = L[W{t)\ .
LAPLACE AND INVERSE LAPLACE
TRANSFORMS
(ID
The frequent use which is made of the
Laplace transform and its inverse, in the
analysis and design of fixed linear networks,
warrants a brief discussion of these trans-
forms.
Given a function f(t) which is identically
zero when t < 0, its Laplace transform g (p) is
defined by the formula
g(p) = Hf(t)]
f(t) e-" dt
(12)
This is usually written with 0 for the lower
limit, but by having the point t = 0 inside the
range of integration, instead of at the end, we
secure the same advantages for (12) that we
gained in the case of (2) by having the point
k = t inside the range of integration. Since f(t)
is identically zero when K0 we could write
— oo for the lower limit in (12) , but this would
run the risk of confusion with the so-called
"bilateral Laplace transform." On the whole,
it is worth while to have a constant reminder
that functions f(t) which are not identically
zero when t < 0 are ruled out.
The integral in (12) is usually not con-
vergent for all values of p. That is, in order to
secure convergence of the integral, it may be
necessary to assume R(p) >a, where R(p) is
the real part of p, and a is a real number. The
CONFIDENTIAL
APPENDIX A
149
result of the integration is a representation of
g(p) in the half-plane R(p) > a. Since the
representation is analytic throughout the half-
plane, the principle of analytic continuation
allows us to extend the definition of g(p) to
the remainder of the /;-plane.
Given a function g{p) which is analytic
throughout the half-plane R(p) > c where c is
a real number, its inverse Laplace transform
/(f) is given by the formula
f{t) = L-'[ff(p)]
] fc+ia
<j{p) €*< dp (13)
provided /(f) is identically zero when t < 0.
If the result of the integration in (13) is not
identically zero when t < 0, g(p) is not a
Laplace transform and the application of the
inverse transformation to it is meaningless.
Translation Theorem
A useful theorem can be established at this
point. This is the translation theorem.
If
G{p) = L[F(t)~\
then
L->[G(p)e ^ = F(t - a)
provided that F (f — a) =s 0 when t < 0. Trans-
lation is to the right or left according as a is
— ™
positive or negative.
If it happens that F(f)==0 when t < t0
where f0 > 0, then the restriction is that
a> — t0. That is, a limited amount of transla-
tion to the left is permissible. In general, f0 = 0
and the restriction is therefore that a > 0. This
theorem follows readily from (12) or (13).
In all of the applications of (13) which we
have any occasion to make in the analysis and
design of fixed linear networks, the function
g(p) may be resolved into a sum of terms of
the form G(p)e-pa where a > 0 and G(p) is a
rational algebraic function with real coeffi-
cients. Making use of the translation theorem,
the problem of evaluating L1 [g (p) ] reduces to
that of evaluating L-'[G(p)]. Now, G(p) may
be resolved into a sum of terms of the form
p" or l/(p — a)m+1 where m = 0, 1, 2 - ••. We
shall consider these two cases separately.
The case G (p) = p" will be treated by means
of (12) and some limiting processes. In Sec-
tion A.l the unit impulse was regarded as the
limit of a rectangular pulse of duration T and
amplitude 1/7. By means of (12) the Laplace
transform of such a
0 < f < T is
over the interval
1 - tr*
pT
Hence
L [£,(()] = lim 1 - e->T _
T-*0 pf - 1 •
Formally therefore
L-> [1] = 1,(0 (14)
Similarly, the Laplace transform of a pulse
over the interval a < t < a + T where a > 0 is
1 -c-"r
pT
Hence
L[60(t-a)}
lim 1 - e-"r
Formally therefore
L-i [e-~] = &0(t~a) .
The last result follows directly from (14) using
the translation theorem.
Next, let
r-*o ji
This is the limiting case, as shown in Figure 5,
of two impulses of strengths 1/T and -1/T
separated by a time interval T. It may be called
T
-t
V-ipCt-T/T
Figure 5. An impulse doublet.
an impulse of second order. By (12) and the
previous results
L [1,(0] - Km 1 -«-"', -
r-»o f v •
Formally therefore
L~l [p] - «,«) .
(15)
Proceeding in this fashion we may define an
impulse of (m + l)th order as
Ut) = lim <— .«) - «— i (t-T)
T-*0
(16^
CONFIDENTIAL
150
APPENDIX A
and we may then show that
MM')] = r.
Formally therefore
L~l [jr] « a.(0
then
(17)
This disposes of the case G(p) = pm where
m — 0, 1, 2 • • • .
The case G(p) = 1/ (p - a) "*l will be treated
by means of (13) and Jordan's lemma.
Jordan's Lemma
If all the singularities of G(p) can be en-
closed by a circle of finite radius with center at
the origin, and if G (p) -*0 uniformly with
respect to arg z as \z\ -> oo, then
G(p)e*dp] - 0
where r is a semicircle oi radius P, with center
at the origin, to the right of the imaginary axis
if t is negative, to the left of the imaginary axis
if t is positive.
By the use of this lemma the contour of inte-
gration in (13) may be closed and the integra-
tion may then be performed by the method of
residues. In the case
lira
<?(P)
(p - a)-+l
we readily obtain
where m — 0, 1, 2
[(p - a)-+>]
t < 0
ml
(18)
/ > 0.
An important special case of (18), correspond-
ing to o = 0, is
J Lp"+1J m!
< > 0
(19)
Another useful theorem which is readily
established by means of (12) and (13) is
Borel's theorem.
Borel's Theorem
If 0(P), 9Av), 9ii.P) are the Laplace trans-
forms of f(t)t /,(«), /,(*), respectively, and if
g(p) - 0i(p) 0t(p)
m - " x) /,(x)dx
- £jx{T)-S*{t-r)dr.
The functions /, (O and ft(t) are subject to
conditions which permit the inversion of the
order of integration in the following proof.
However, these conditions are seldom of any
concern. We have
ftfl = L-l{0i(p) • L [/»(*)]}
Inverting the order of integration and noting
that
2x1 Jc-i<r>
gi(p)tp(,~x) dp
0 if X > t
f(t - X) if X < <
we obtain the result stated in the theorem.
*•» ALTERNATIVE EXPRESSION OF THE
RESPONSE-TO-SIGNAL RELATIONSHIP
The result (8) obtained in Section A.2 sug-
gests an operational expression of the form
V® = Y(p) ■ E® (20)
for the response-to-signal relationship what-
ever the signal E{t) might be. If the equiva-
lence of this operational expression to (2) it
taken as a matter of definition we may readily
discover the nature of the implied operation.
In the light of Borel's theorem, (2) may be
expressed in the form
L[V(t)} = L\W(»] • L\EW]
under the permissible assumption that £(t)«0
when t < 0. Hence
V(#) = lrx [LflPOl ■ L{E(t))\
or, by (11)
V(0 = L~l \ Y(p) ■ L[E(t)]\ . (21)
This is, therefore, in general the meaning of
the operational expression (20) .4
o We note that if S(p) = L\E(t)\, the operational
V(t) ~ S(p) ■ W{t)
U equivalent to (20). Thii form ia need in Section 104
and in Appendix B.
CONFIDENTIAL
J52
APPENDIX A
The symmetry of the impulsive admittance
is expressed by
W(T - t) = W(t)
Since W(t) =0 when t < 0, it must be so also
when t > T. Hence
' W{t)e~*dt + / W(t)e~*dt.
By a change of variable of integration the sec-
ond term may be expressed in the form
W(T -t)e-*T-»dt
Assume that W(t) admits the series expan-
sion
Wit) = a0 + A,t + ... +4;r + ••• • <25)
771 ,
r
or, because of the sj
Xr/i
W(Qe* dt .
Hence, if the first term in Y(p) be
W(t)e-* dt
we have
Y(p) = Yy(p) + Yi{-p)er+*
= [iri(p)epT/2 + Ki(-p)e-pT/2] tr*Tn .
At real frequencies (p = u>) the bracketed fac-
tor is evidently an even real function of
Hence
Y(tu)
• e-u*r/I.
(24)
Apart from discontinuities in the phase angle
of the transmission function at real frequencies
» for which QU2) is zero, the phase angle is
proportional to frequency. Such a transmission
function is referred to as a linear phase trans-
mission function. Sinusoidal components of the
signal, of frequencies less than the lowest fre-
quency at which Q (<uJ) vanishes, suffer phase
retardations in transmission in proportion to
their frequencies. These components therefore
contribute no delay distortion. They are delayed
by a uniform amount, just as they are in a
properly terminated distortionless, uniform
transmission line, although in the case of (24)
they contribute amplitude or loss distortion
through Qiw2). The delay in (24) is just half
of the "smoothing time" T.
SERIES RELATIONSHIPS BETWEEN
IMPULSIVE ADMITTANCE AND
TRANSMISSION FUNCTION
Two useful series relationships between im-
pulsive admittances and transmission functions
will be derived in this section.
for small positive values of t. Then by (11)
and (19)
(26)
pi 1 ' pmH
If A0 0 the transmission cannot drop off
faster than 6 db per octave as the frequency
increases indefinitely. If the transmission is to
drop off ultimately at the rate of 6fc db per
octave all of the A's up to and including Ak.2
must be zero. This is to say that the impulsive
admittance and all of its derivatives of orders
up to and including the (k — 2)th must vanish
at * = 0.
Next, let us suppose that the impulsive ad-
mittance and all of its derivatives of orders up
to and including the (k — 2)th are continuous
through all values of t including t — 0 except
that the (k — 2)th derivative is discontinuous
only at t = a. We may resolve the impulsive
admittance into the sum W,(t) + W2(t) where
W1 (t) and all of its derivatives of orders up to
and including the . (fc — 2)th are continuous
through all values of t including t = 0, while
W2(t) =0 for all values of t < a. Then, for
small positive values oft — a
Ak.i (t - a)*"'
W,(t)
(k -
(Ak.t * 0)
whence
Hence the transmission cannot drop off ulti-
mately faster than 6(k — 1) db per octave. We
may summarize these results in the asymptotic
loss theorem.
Asymptotic Loss Theorem.
If the transmission is to drop off ultimately
at the rate of 6A; db per octave as the frequency
increases indefinitely, the impulsive admittance
and all of its derivatives of orders up to and
including the (k — 2)th must be continuous
through all values of t including t = 0.
Discontinuities in W(t) or in some deriva-
tive of W(t) cannot occur except at t = 0 in
the case of physical lumped element networks.
Practically, however, rapid changes in W(t)
CONFIDENTIAL
APPENDIX A
153
or in some derivative of W(t), at any value of
t, may be expected to be associated with much
the same behavior of the transmission at rea-
sonably high frequencies. As an example con-
sider the case
W{t) = e-- -e-v (0 > a > 0).
0 - a
F(p)
(p + +
W(t) is continuous through t — 0 as long as 0
is finite but becomes discontinuous there in the
limit as fi-* ». The first derivative of W(t)
is discontinuous through t = 0 even when 0 is
finite. The ultimate slope of the transmission is
12 db per octave, in accordance with the
asymptotic loss theorem, but in the range
a < w < p the transmission appears to have a
slope of only 6 db per octave.
The importance of the observations made in
the preceding paragraph, in the design of a
network, is that if we attempt to approximate
a W(t) which has a discontinuity in a deriva-
tive of lower order at t = a than at t = 0, the
fact that the physical approximation must have
continuous derivatives of all orders and through
all values of t except t - 0 is not very signifi-
cant. The ultimate slope of the transmission
may not be reached until the frequency is too
high to be of any importance.
Another useful relationship between impul-
sive admittance and transmission function fol-
PHYSICAL RESTRICTIONS ON THE
TRANSMISSION FUNCTION
The transmission function Y(p) of a lumped
element network is a rational algebraic func-
tion of p. It is real for real values of p (A.2) .
Hence, the coefficients must be real, and there-
fore the roots and poles must either be real or
occur in conjugate complex pairs.
Such a function may be expanded into the
sum of a polynomial and a rational function
whose numerator is of lower degree than the
denominator. The latter may therefore be prop-
erly expanded into partial fractions. For a
partial fraction of the form
— L_ *here)B=l,2 ...
(p — a)"
the contribution to the impulsive admittance
W(t) is by (18)
I; 1~- 1 = , » « > 0) .
L(p - a)"J (m - 1)!
For a pair of partial fractions of the form
A + iR A - iB
(p - a + iff)" + (p - a - iff)m
the contril
2r-i
to the impulsive admittance is
C (A cos fit + B sin pi) .
(m - 1)!
Since the impulsive admittance is the re-
sponse to an impulsive signal it is clear that for
/"» a stable network the impulsive admittance must
lows from the assumption that / t-W (t) dt be free of terms which increase indefinitely
with time, either on account of an amplitude
is finite for m =
exponential in
1, 2 ... If we expand the
F(p) = / \\'itu-*,tt
into a power series in pt we get
F(P) - M, - M , p + _
2!
3!
+
where
rW(t)di .
(27)
(28)
The quantity Mm is the mth moment of the im-
pulsive admittance.
When M„ = 1 we speak of the response of the
network as a weighted average of the impressed
signal, and speak of the impulsive admittance
W(t) as the weighting function.
factor of the form eat where a > 0, or; in the
event that a = 0, on account of an amplitude fac-
tor of the form fr"-1 where m > 1. Hence, the
physical restrictions on the transmission func-
tion are:
1. No poles with positive real parts.
2. Poles on the imaginary p axis must be
simple."
The poles of a passive transmission function
correspond to modes of free motion.lsh Each of
them may be shownlM to satisfy an equation of
the form
pT + F + - = o
P
where T, F, V are positive quantities whose
values depend upon the particular mode and
• Poles on the imaginary p axis must also be ruled
out on the ground that persistent transients cannot be
tolerated any more than growir
CONFIDENTIAL
154
APPENDIX A
its activity. However, T is zero in the absence
of kinetic energy, F is zero in the absence of
energy dissipation, and V is zero in the absence
of potential energy. It follows that in the
absence of coils or in the absence of condensers,
the transmission function must have poles only
on the negative real p axis.
For extremely narrow-band, low-pass appli-
cations, such as data smoothing, it is not prac-
ticable to build networks which call for coils
because these generally turn out to be of many
thousands of henries in inductance. The exclu-
sion of coils from these applications does not,
however, rule out transmission functions with
complex poles. These may be realized with RC
networks in feedback amplifier circuits as is
shown in Chapter 12.
*•» QUASI-DISTORTIONLESS
TRANSMISSION NETWORKS
A quasi-distortionless transmission network
is one which is distortionless only in a certain
sense. This sense will be made clear in this
section.
Let
Y(p)
1 + dip + o2p2 + ■ ■ • +ampm
1 + hp + 62p2 + . . . + bnjj*
(29)
This may also be written in the form
Y{p) - 1 + clP +
C-^+...+CI^+pr + lg(p)m
Obviously g (p) will be a rational function with
the same denominator as Y(p) and a numera-
tor of (*n-l)th degree. If we now apply a sig-
nal of the form
E{t) = 0
= r
for t < 0
for i > 0
the response, by (21), will be
V(t) « F + rcT* + ^7=2), cS-'+.-.+c,
+ rl L-1 [g(p)} «>0).
If the coefficients in the rational expression for
Y(p) are such that
ci = t/, c2 = //,•■• cr = fj
(31)
then
V(t) = (t + t,)> + r! L-i [g(p)} (t > 0). (32)
The second term vanishes exponentially with
time. The first term is an advanced or a re-
tarded facsimile of the applied signal accord-
ing to whether t, is positive or negative. We
shall say that Y(p) is the transmission func-
tion of a network which is quasi-distortionless
to the signal tr.
Obviously a transmission network which is
quasi-distortionless to the signal f must also be
quasi-distortionless to every signal f where s
is a positive integer less than r, including zero.
Hence we may state the quasi-distortionless
transmission theorem.
Quasi-Distortionless Transmission
Theorem
If the signal
E{t) = 0 for t < 0
= polynomial of degree r at most in / for
t > 0
is applied to a "quasi-distortionless transmis-
sion network of order r," the response will be
of the form
I'm = E{t + if) + {)(<■-<) for / > o,
where O(e ') stands for terms which vanish
exponentially with time.
If t, > 0 the transmission network is a pre-
dictor for polynomials of degree r at most.
However, it does not begin to predict properly
until some time has elapsed after the start of
the signal, or of a new analytic segment of the
signal; that is, until the transients have sub-
sided sufficiently.
If t{ — 0 the transmission network may be
regarded as a delay-corrected smoother for
polynomials of degree r at most. This is ob-
tained simply by taking
ai = bi, n2 = b2, ■■■ aT = bT
(33)
in (29),
A. 11
VARIABLE LINEAR NETWORKS
A variable linear transmission network is
one in which the response V(t) is related to the
impressed signal £(0 by the linear differential
equation (1) with coefficients which are pre-
scribed functions of t. The solutions of such a
differential equation also obey the superposi-
tion principle. Thus it is possible in this case
also to formulate the response of the network
to any signal in terms of its response to a
standard impulsive signal.
The response of a variable network to an
impulse or any form of signal depends, how-
CONFIDENTIAL
APPENDIX A
155
ever, on the time at which the signal is applied.
For an impulsive signal applied at time \ the
response at time t will be represented by
W(t,x). This is still called the "impulsive ad-
mittance." In the theory of linear differential
equations it is known as a Green's function.
Physically, it must be identically zero for
The superposition theorem may now be writ-
ten in the form
V(t) = jT+ E(\) ■ W(t,\) d\ (34)
provided the network has been properly de-
signed and set into operation at t — 0. If
W(t,\) dX = 1
for all values of t > 0, the response may be
interpreted as a weighted average of the sig-
nal. We note that in order to interpret the
response as a weighted average of the signal,
it is now no longer necessary to take the lower
limit in (34) as — oo, as it was in the case of
(2) for a fixed network. In other words, a
variable network can be designed and set into
operation at any time so that components of
the signal which arrive before that time are
completely ignored.
The analysis and design of variable linear
networks are in general much more difficult
than those of fixed linear networks. This is due
largely to the fact that there does not yet exist
a technique corresponding to the steady-state
and operational methods used in connection
with fixed networks. However, there is a class
of variable networks whose analysis and design
are greatly facilitated by the fact that they are
related to fixed networks by a transformation
of the time variable.
Consider the linear differential equation
. d"V dn~lV , . dV , Tr „
with constant coefficients. With appropriate
restrictions on the roots of the characteristic
function
6nXn + fc.-xX"-1 + ••• +bi\ + 1
it represents the response-to-signal relation-
ship in a fixed network, if z is proportional
directly to time. However, if z is a more gen-
eral function of the time, it will correspond to
a variable network. The kind of transformation
which is desired here is one which transforms
the range - oo < z < + tx into the range
0 < t < + oo with a one-to-one correspondence.
Thus, we may take z = log 6(t) where 6 (t) is a
positive monotonic increasing function of t in
the range 0 < t < + oo, with <li£0 6(t) = 0. Sev-
eral examples of 6(t), including 0(t) = t, are
considered in detail in Chapter 14.
CONFIDENTIAL
APPENDIX B
THEORETICAL MODIFICATIONS OF SMOOTHING FUNCTIONS TO FIT
NONUNIFORM NOISE SPECTRA
BEST smoothing or weighting functions have
been determined in Chapters 10 and 11
under the assumption of random noise with fiat
spectrum. It has not been worth while in prac-
tice to base the choice of best weighting func-
tions on any more elaborate considerations of
actual noise spectra, for at least three reasons :
1. The effectiveness of a smoothing network
shape of the weighting function.
2. Noise spectra are subject to variations,
due to factors which it is not desirable in prac-
tice to attempt to control.
3. Elaborate smoothing functions require
elaborate networks with close tolerances on ele-
ment values.
Nevertheless, the theory of smoothing pre-
sented in this monograph would not be com-
plete without showing how more general shapes
of noise spectra can be considered. Two meth-
ods are presented here, which are generaliza-
tions of those presented in Sections 10.3 and
10.4, respectively.
» 1 PHILLIPS AND WEISS THEORY7
Let g(t) be the tracking error, and W (t) the
impulsive admittance of a smoothing and pre-
diction circuit with smoothing time T. Then
the error in prediction due to tracking error
only, is
m = fQTQ{t - r) • W(t) dr.
The impulsive admittance W(r) will depend
also upon the time of flight which, for purposes
of analysis, is assumed to be constant. The
mean square error is then
V2 = -lim kjlLY^di
Jo So
W(Tl) • C(n - T|) • WWdtidtt
where
C(x)
lim
2L
g(\) ■ g(\ + x) d\ • (1)
C(x) is the autocorrelation of the error time-
function g (A) .
For an nth order smoothing and prediction
circuit V2 is now minimized with respect to the
impulsive admittance under the restrictions*
jf
T"W(r)dT = C-</)" (w = 0. 1. 2 ••• n). (2)
Hence W(r) must satisfy the integral equa*
tion
jj C(t - r) • W(r)dr = *0 + *i< + • ■ • + U"
(0 <. 1 <. T)
where the km are constants to be determined.
Now, if
i C(t - t) • W.m(r)dT = V" (0 <• t <. T)
Jo
(to = 0, 1, 2 - n) (3)
then
W(t) = hWoir) + hWi(r) + ••• + KWn(r). (4)
The procedure is then to determine C(x) from
(1), the Wm(r) from (3), the km from (2) and
(4), and finally W(T) from (4). It may be
noted that, in general, every km will be a poly-
nominal of nth degree in tf. Hence the Wm(r)
appearing here are not the same as those de-
fined in Chapter 11, although W(t) should be
the same if the same W0(t) is used in Chapter
11.
A difficulty of the theory given above is in
the solution of the integral equations (3) . This
difficulty is avoided in the theory given in the
next section. However, the integral equations
are easily solved in case of flat random noise,
when C(z) is simply an impulse of strength K
say, at x = 0. Then
W,
0 < t < T.
Since the strength is irrelevant, it may be taken
equal to T so that W0(T) will be normalized.
'These follow from the discussions in Sections A.8
«J A.10, especially equations (27), (28), (30), and
156
CONFIDENTIAL
APPENDIX B
157
For a linear prediction circuit it is then found
that
W(r) = 2 (2 + %)w0(r) - ! ( 1 + I ) Wr(r).
Putting T = 1 this may be expressed as
W(t) « Wo(t) + G,(- tf)voiM (t)
in terms of the G.(T) and Wmir) of Section
11.3.
« SYMMETRY OF BEST SMOOTHING
FUNCTIONS
The theory of Phillips and Weiss offers the
most direct proof that the best smoothing or
weighting function must be symmetrical, re-
gardless of the noise power spectrum. The
situation is that of minimizing (1) under only
one of the restrictions (2), viz., the normaliz-
ing condition
Jr W(r)dr - 1 (5)
The weighting function is therefore deter-
mined, up to a constant scale factor, by the
condition that
jf C it - t) • W(r)dr « k, (6)
where k is a constant. Substituting T — t for t
and T — t for t, we have
/C(t - 0 • W(T - r)dr « k. (7)
Since C( - x) = C(x), and since W(r) is de-
termined uniquely by (6) and (5), it follows
from (6) and (7) that
W(T - t) = W(t). (8)
»• GENERALIZATION OF ELEMENTARY
PULSE METHOD
The noise power transmitted through a net-
work may be expressed in the familiar form
p = / N(w») • |r(tW)|»d«
where N(u>*) is the noise power spectrum and
Yip) is the transmission function of the net-
work. Assuming that N(a>*) is a rational func-
tion of »*, which is finite at all finite values of
w including zero, it is possible to determine a
rational function S(p), which has no poles on
or to the right of the imaginary axis in the
p-plane with the exception of the point at infin-
ity, and such that
|S(tw)|2 = AT(fc>2).
It may be readily shown that
r-'£v<f>Y* (0)
where F(t) is related to the impulsive admit-
tance W(t) by the operational equation
F(t) = S(p) ■ Wit) (10)
The problem is now to minimize (9) under the
restriction
^ / Wit)di = 1 when <o > 1. (ll)
Let
where
Qip) - (P + «i) (p + 01) • • • (p + «-)
Hip) - (P + A) (p + A) ••• (p + A)
and ft is of no consequence. One or more of the
a's, but none of the pa may be zero. Since the
existence of the integral in (9) imposes the
requirement that Fit) have no discontinuities
of higher type than finite jumps in the range
0 - < t < 00, the continuity conditions on W(t)
in (10) must depend upon the difference be-
tween m and n in the expressions for Q (p) and
Rip).
If m > n, it is fairly obvious that Wit) must
be differentiate, in the ordinary sense, exactly
m — n times. In other words, Wit) and all its
derivatives up to and including the (m — n
— l)th must be continuous, but the (m - w)th
derivative may have finite jumps. If m < n we
must consider the introduction into Wit) of
discontinuities of higher type than finite jumps.
These discontinuities arise in the formal ex-
tension of the concept of differentiation to
functions containing finite jumps.
If a function 4 it) has a finite jump of am-
plitude A0 at t = a, the value of 4,' it) at that
point will be indicated formally as A0 • S0(t — a)
where S0 it — a) is a unit impulse at t = a. If
*'(a + 0) - *'(a - 0) = A„ the value of 4," it)
at t = a will be indicated formally as A0 .
it - a) + A, • 8„« - a) where $,(« - a) is a
CONFIDENTIAL
158
APPENDIX B
unit doublet at t = a. And so on, for higher de-
rivatives of $(<).
The expression (9) is a minimum under the
restriction (11) if Wit) satisfies the differ-
ential equation
Qip) -Q(-P) W(t) = const. (12)
when 0 < t < 1 and Y (p) the condition
1 /**"
2^ / S(P) -S(-P) • y (p)e*dp - const,
when 0 < t < 1. (13)
The restriction (11)' itself requires that
TP(t) =0 when t > 1, and
•i+
TT(<)<& = 1. (14)
r
Case I. (n = 0)
The general solution of (12) contains 2m + 1
constants of integration which are determined
by (14) and the 2m continuity conditions that
Wit) and all of its derivatives up to and in-
cluding the (m - l)th must vanish at t = 0 and
t = I.
Case II. (n # 0, m > n)
The general solution of (12) contains 2m + 1
constants of integration which are reduced
to 2n in number by (14) and the 2(m - n)
continuity conditions that Wit) and all of its
derivatives up to and including the (m — n —
l)th must vanish at t = 0 and at t = 1. The
remaining 2n constants are determined by (IS) .
The left-hand member of (13) may be for-
mulated by the method of residues. The ex-
pression for Yip) should first be separated
into two parts so that
Yip) - YL(P) + YK(p)e->
where YL (p) and YK(p) are rational functions
of S(p) S(-p) .YL(p)e» in the left-hand
in the left-hand half of the p-plane for the first
part of Y (p) , and in the right-hand half for the
second part. Hence, if the sum of the residues
of S(p) - S(— p) - YL(p)e» in the left-hand
half of the p-plane be donated by St. and if the
sum of the residues of Sip) • S(—p) • YM(p) ■
e»(t-i) in the right-hand half of the p-plane be
denoted by XK> then the condition (13) re-
duces to
2t - - const. (15)
Case III. (n ^ 0, m < n)
The 2m + 1 constants of integration in the
general solution of (12) are first increased to
2n + 1 by appending the 2 (n - m) singularities
kit), «i(0, 1(0
«o(< - 1), Slit - 1), ••■ — i H ~ 1)
and then reduced to 2n by (14) . The remainder
are determined by (13) or (15).
In formulating
Yip)
it may be noted that
£,[«„(< - a)] =
Example of Case I
W«)]
(a £ 0) .
Let S(p) = p". The differential equation (12)
requires Wit) to be a polynomial of degree 2m.
The conditions at t = 0 require it to have a
factor tm, and those at t = 1, a factor (1 — t)m.
This leaves only (14) to be satisfied. Hence
Wit) - (2t^,1)! [*(i - 01- (0 <; t Z 1)
in agreement with (8) of Section 10.8.
Example of Case II
Sip)
p + a
P + 0
Let
Then, by
W(t) - A0 + Aie-« + A,f (0 < < £ 1)
Hence
Y(p) . — 0 + — — — -l
(12)
p + a p — a
_ pL- + dip + A-q e-,
|_p p + a p-aj
2, =
Condition (15) is satisfied if
1
2
CONFIDENTIAL
APPENDIX B 159
where Example of Case III
Q « °" - 0i r . Let S(p) = 1/1 + fi. Then, by (12) and the
sinh ^ + 0 cosh rule for appending singularities in Case III
Hence W(t) = A0 + AMO + At60(t - 1) (0 £ 1).
Hence
l+Qcosha(/-i)
In the limit as o-»0, S(p) - - _ j^T + — ^ — e~
and 2* = - ^° ~ eK'-D .
W(t) « =-±-2 (0 <: < £ 1) . Condition (15) is satisfied if
1 + 1 &i A
f 62 + 0 A\ m At —
0
In terms of expressions (12), Section 11.3.
Hence
W(t) = Wt(t\ ± k™l(t) (0 il£l) , + + 6o(t - 1)
where k = 1/6 [£'/ (2 + £)]. This is reminis- w,q m f (0 £ f £ 1)
cent of Stibitz's results mentioned in Section 2
10.3. 1 + -J
p
CONFIDENTIAL
BIBLIOGRAPHY
PART II
1. The Extrapolation, Interpolation and Smoothing of
Stationary Time Series with Engineering Applica-
tion*, Norbert Wiener, OSRD 870, Report to the
Services 19, Research Project DIC-6037, The Mas-
sachusetts Institute of Technology, Feb. 1, 1942.
Div. 7-318.1-M2
la. Ibid., Chapter 1.
2. The AnalytiM and Design of Servomechanisms,
Herbert Harris, Jr., OSRD 454, Progress Report to
the Services 23, The Massachusetts Institute of
Technology. Div. 7-321.1-M7
8. Behavior and Detign of Servomeehanitmt, Gordon
S. Brown, OSRD 89, Progress Report 2, The Mas-
sachusetts Institute of Technology, November 1940.
Div. 7-821.1-M1
4. Antiaircraft Director T-15, OEMsr-358, Report to
the Services 62, Western Electric Company, Inc.,
August 1948. Div. 7-112.2-M6
5. The Analytit and Synthetic of Linear Servomecha-
nicmc, Albert C. Hall, OSRD 2097, Report to the
Services 64, The Massachusetts Institute of Tech-
nology, May 1948. Div. 7-821.1-MS
6. Antiaircraft Director, T-lS-El, E. L. Norton,
OEMsr-858, Report to the Services 98, Bell Tele-
phone Laboratories, Inc., July 80, 1945.
Div. 7-112.2-M11
7. Theoretical Calculation on Bett Smoothing of Poti-
tion Data for Gunnery Prediction, R. S. Phillips
and P. R. Weiss, OEMsr-262, AMP Note 11, Re-
port 532, The Massachusetts Institute of Tech-
nology, Radiation Laboratory, Feb. 16, 1944.
Div. 14-244.4-M'l
AMP-703.4-M11
8. A Long Range, High- Angle Electrical Antiaircraft
Director [Final Report on T-10], C. A. Lovell,
NDCrc-127, Research Project 2, Division 7 Report
to the Services 80, Bell Telephone Laboratories,
Inc., June 24, 1944. Div. 7-112.2-M9
9. Flight Records of Pitch, Roll, and Yaw, taken in
a variety of bombers at Wright Field, Ohio, Sperry
Gyroscope Company, 1942-5.
10. Detign and Performance of Data-Smoothing Net-
work, R. B. Blackman, OEMsr-262, Report MM-44-
110-38, [Bell Telephone Laboratories, Inc.], July 8,
1944.
11. Computer for Controlling Bombers from the
Ground, E. Lakatos and H. G. Och, OEMsr-262,
July 24, 1944.
12. A Position and Rate Smoothing Circuit for Ground-
Controlled Bombing Computers, R. B. Blackman,
OEMsr-262, Report MM-44-110-79, [Bell Telephone
Laboratories, Inc.], Aug. 21, 1944.
13. A Two-Servo Circuit for Smoothing Present Posi-
tion Coordinates and Rate in Antiaircraft Gun
Directors, R. B. Blackman, Contract W-30-069-
ORD-1448, Report MM-44-110-65, [Bell Telephone
Laboratories, Inc.], Sept. 27, 1944.
14. The Theory of Electrical Artificial Lines and Fil-
ters, A. C. Bartlett, John Wiley and Sons, Inc.,
1931, p. 28.
15. Network Analysis and Feedback Amplifier Design,
H. W. Bode, D. Van Nostrand Company, 1945.
15a. Ibid., Chapters 7, 8, 18, and 14
15b. Ibid., p. 813.
15c. Ibid., p. 326.
15d. Ibid., p. 801.
15e. Ibid., p. 38.
15f. Ibid., p. 12.
15g. Ibid., p. 78.
15h. Ibid., p. 110.
15i. Ibid., p. 133.
15 j. Ibid., Chapter 6.
16. Fundamental Theory of Servo-mechanisms, L. A.
MacColl, D. Van Nostrand Company, 1945.
17. Automatic Control Engineering, E. S. Smith, Mc-
Graw-Hill Book Company, Inc., 1944.
18. Die Lehre von den Kettenbrucken, B. G. Teubner,
Leipzig, 1918.
19. "Transient Oscillations in Wave Filters," J. R.
Carson and O. J. Zobel, Bell System Technical
Journal, July 1923.
20. "Harmonic Analysis of Irregular Motion," Nor-
bert Wiener, Journal of Mathematics and Physics,
Vol. 5, 1926, pp. 99-189.
21. "Generalized Harmonic Analysis," Norbert Wie-
ner, Acta Mathematica, Stockholm, Vol. 55, 1930,
pp. 117-258.
22. "Stochastic Problems in Physics and Astronomy,"
S. Chandrasekhar, Review of Modern Physics, Vol.
15, 1943, pp. 1-89.
28. "Mathematical Analysis of Random Noise," S. O.
Rice, Bell System Technical Journal, Vol. 23, 1944,
pp. 282-832.
23a. Ibid., Vol. 24, 1945, pp. 46-156.
«S 1S07S
CONFIDENTIAL
[>1
Cover Sheet for technical memoranda
Research Department
subject: The Transient Behavior of a Large Number of Four-
v-' Terminal Unilateral Linear Networks Connected in
Tandem - Case 20876
ROUTING:
1 - H.W.BW.B*F.-H.F#-Case Files mm- 46-110-49
2 — case files °ATE April 10, 1946
3- L.G.Abraham-T.E. Brewer authors C.L* Dolph
4- C.H.Elmendorf-H.K.Krist idotbqkxoex C.E. Shannon
s - H.S.Black-F.B. Anderson Index No. W1.416
e- G»N*Thayer-C.W.Harrison
7 - R.L.Dietzold
a - L.A*MaoColl ' 1
9 - B.M.01iver
10- C.L^Dolph
11- C.E.Shannon
ABSTRACT
Asymptotic expressions for the transient
response of a long chain of four-terminal unilateral
linear networks connected in tandem subject to an
initial disturbance are developed and classified accord-
ing to the characteristics of the common transfer ratio.
It is shown that a necessary and sufficient condition
for the stability of the chain for all n is that the
transfer ratio be of the high pass type.
The mathematical results are applied to
chains of self-regulating telephone repeaters.
The Transient Behavior of a Large Number of Four-Terminal
Unilateral Linear Networks Connected in Tandem - Case £0878
MM-4 6- 110-49
April 10, 1946
MEMORANDUM FOR FILE
Introduction
The transient response behavior of a long chain of
invariable four-terminal networks connected .unilaterally in
tandem is of primary importance in the design of cross-country
wire communication systems, since the successful operation of
such equipment depends upon the rapid damping of transients
caused by suddenly applied inputs.
While the emchasis in the memorandum will be directed
toward coaxial systems cons'is-fcing of self-regulating ^repeaters
spaced at 3-7 mile intervals and spanning distant points, the
results are of a more general nature and would apply, with
obvious modifications and corresponding interpretations, to any
configuration involving a large number of four-terminal linear
invariable networks connected unilaterally in tandem.
It will be shown that there are two fundamentally
different types of transient, response possible depending upon
the gain characteristic of the transfer ratio of the individual
four-terminal linear networks comprising the system. The first
type of response while satisfactory is difficult to achieve in
practice because of the stringent requirements on the gain
characteristic of the transfer ratio. The second, a case often
encountered in practice, will be shown to be unsatisfactory in
general since it leads to build-up and overloading in any
physical system comprising a large number of such networks.
However, a guiding design orinciple will be suggested which,
it is believed, will enable us to minimize the worst of the
effects, and make the successful operation of a system of the
type envisaged here possible.
This memorandum is divided into two parts. In the
first the problem is defined physically and then formulated
mathematically. Following this, the history of the problem is
discussed briefly after which the new results are summarized.-
Finally, this part concludes with a discussion of their inter-
pretation and implications for the coaxial system. The second
part presents the detailed mathematical arguments which led to
the new results of part one.
PART I
Statement of the Problem
The analysis in this memorandum is directed toward
the understanding of certain anomalous effects which a long
chain of self-regulating telephone repeaters may exhibit at its
output when the input end of the chain is subject to a transient
disturbance (Cf. Figure 1).
The gain settings of the repeaters in such a chain
are usually controlled by the level of a pilot frequency some-
where in the communication band and the regulation is designed
to compensate for low frequency phenomena (up to approximately
one cycle per second) such as the diurnal Change in line resis-
tance. The repeaters in the chain are normally absolutely
stable devices so that any transient which is presented to the
input of any one of them will be evanescent in time at the
output of that repeater.
Since transients are not damped out instantaneously
even in absolutely stable devices, a transient disturbance at
the input to the first repeater in such a chain will be pro-
pagated down the chain. It has been experimentally observed
that under certain conditions the' maximum amplitude of a tran-
sient disturbance may increase as the disturbance is propagated
from one repeater to the next and in some cases there may be
many oscillations of sufficiently large amplitude to render the
system inoperative because of prolonged over-loading.
If the entire chain from its input to its output end
is considered as a whole, the chain does behave then in many
respects like an unstable non-linear device in spite of the
fact that each repeater in the chain is absolutely stable.
Since it is obvious that the above type of behavior
is at best undesirable in a cross-country link, it is necessary
that its cause be thoroughly understood and that all .possible
steps be taken either to suppress it or, if this is not possible,
at least to minimize its effects.
Although it is not reasonable to expect that transient
oscillations can be kept from propagating down the line, or that
it is possible to isolate the line from all transient disturbances
it is reasonable to seek a means of guaranteeing that the tran-
sients that are propagated down the line will never possess
amplitudes that exceed the magnitude of the original disturbance
or to seek a way to guarantee that the maximum response of the
transient oscillations will occur so shortly after the initial
disturbance that physical apparatus will be incapable of follow-
ing or distinguishing it from the unavoidable initial disturbance.
A way of guaranteeing the first of these will be discussed at
length and a suggestion will be made which it is felt will
guarantee the second, although no rigorous proof of this last
fact has yet been given.
Fig. 2 represents a schematic drawing of a typical
satisfactory type of transient response which might result from
a unit step input to the first unit of Fig. 1. Fig. 3, on the
other hand, represents a schematic drawing of a typical unsatis-
factory type of transient response which could result from the
same input to a system of the type of Fig. 1 which had different
characteristics. Briefly then, the problem to be discussed is
that of determining the relationships between the network
characteristics and the transient response for networks of the
form of Fig. 1.
Mathematical Formulation of the Problem
A sudden change in level in the pilot freauency
before the n-th repeater results in the modulation of this
frequency, changing it from its normal form
A sin <i> t
C
to
A sin u> t [1 + f(t) ]
c
where f(t) represents the modulation introduced by the tran-
sient.
After passage through the n-th repeater, this last
expression is transformed into
A sin (u>0t + <p) [1 + g(t)],
- 4 -
where the repeater and regulator have (possibly) changed the
carrier by the addition of the phase angle q> and have modified
the original envelope A[l + f(t)] into A[l + g(t)].
It is clear that from the standpoint of regulation
it is sufficient to limit discussion to the transformation
of f (t) into g(t) .*
The exact relationship between f(t) and git), of course,
depends upon the characteristics of the repeater-regulator cir-
cuits which are in general non-linear. However, for small signal
inputs their behavior may be satisfactorily represented by that
obtained from a linear invariable four- terminal network. Thus,
the chain of self-regulating repeaters may be replaced, for the
purpose of mathematical analysis, by a chain of linear invariable
four-terminal networks having a common transfer ratio y(p). Thus,
the blocks of Fig. 1, will be idealized as being such linear four
terminal networks throughout the analysis.
Because regulation is designed to compensate for low
frequency phenomena, certain characteristics that y(p) should
possess are known a priori: namely;
" (1) y(p) must represent a high-pass system. That is, .
y(p) — > 1 as p — > oo
(2) y(0) should be zero if, in the terminology of servo
theory, there is to be no static error.
■
In terms of y(p), the design of a self-regulating
system reduces to two problems:
(I) Given y(p), to calculate the transient behavior of
the chain of self-regulating repeaters,
(II) The design of a system having a y(p) which leads
to satisfactory transient behavior.
The rest of the memorandum will be concerned largely
with the first of these. The calculations will be carried out
in general terms and the different types of possible responses
will be described in terms of the characteristics of y(p),
* Transit time between repeaters is neglected throughout this
memorandum. More exactly, we choose a different origin of time
at each repeater, so that the transit time does not appear ex-
plicitly in the formulae.
- 5 -
Mathematically the problem discussed in this memoran-
dum can be formulated as follows: If 'y(p) represents the common
steady-state transfer ratio of the four-terminal linear units
shown connected in tandem in Figure 1, the output voltage response
of the n-th unit V(t) is given by the inverse Laplace integral:
vn(t) = ^
-C + 1CD
c-ioo
y(p)n epH0(p) dp
where V (p) represents the spectrum of the input voltage,
o
For an impulsive input of intensity YQ applied at
time t = 0,
= V
For a step function input of height VQ applied at
time t = 0,
VQ(p) = VQ/p.
-
Specifically, this memorandum will be devoted to the
study of the behavior of Vn(t) for large values of n.
Four-terminal networKS are normally classed as low-,
band-, or high-pass depending upon the character ofly(iw)|.
Typical examples of I y( ico) I are shown in Figure 4a, in which,
following the usual practice, ly(iu)l has been normalized to be
unity at a) = 0 in the low-pass case; at o> = wo> (the mid-band
frequency), in the band-pass case; and at to = oo in the high-pass
case.
From the viewpoint of the asymptotic behavior of the
system in Figure 1, it is convenient to modify this classifica-
tion somewhat when speaking of the over-all gain characteristic,
|y(iu))|n, of the transfer ratio of a system comprised of n units.
For sufficiently large n, it is clear that |y(iu)|n would lead
to curves of the type shown in Figure 4b corresponding to the
low-pass, band-pass and high-pass curves of Figure 4a. Thus,
for sufficiently large n, the gain curves B*, C«, and D* of
- 6 -
Figure 4b are seen to exhibit the type of behavior normally
associated with a band-pass characteristic. A'* and E*y °n the
other hand, exhibit behavior of the type normally classified as
low-pass and high-pass. For these reasons, the terms low-, and
high-pass will henceforth be reserved for those gain character- ,
istics which are always less than their values at u = 0 and
a) = oo , respectively. The termj band-pass, will be used to
cover all other cases; namely, those in which ly(ia>)| possesses
one or more maxima at finite frequencies, the values of which
exceed the values of ly(iu))| at both zero and infinity.
History of 'the Problem
Several people have considered this problem in the
above mathematical form. Before proceeding to a discussion of
the results of the general theory, it will be instructive to
consider a few illustrative examples of their results.
Let
(2) =
y(p) = p/(p+D
The gain characteristic is clearly of the high-pass
type and satisfies (1) and (2) of Page 6. If the input voltage
is a unit step, then, by the theorem of residues,
,n-l
d(t)
n-1
i ' — 'p=-i
where L- ,(t) denotes the Laguerre polynomial of degree (n-2).
A plot of Vn(t) for n = 1, 2, . . . , 10 is shown in Figure 5. It
is known that for large n
Lit) = J= ? (nt)-1/4 cos
11 V TT
2(nt)1/2 - g
*This examde was first treated by L. A. HacColl (MM-39-325<-166) ,
9/11/39 and W. H. Wise ( UK- 38-343-22 ) , 8/2/38. The above
treatment follows that of LlacColl.
where = is to be interpreted as "asymptotically equal to."
Thus
t
A plot of the approximate "envelope"
t
1 e 2 (nt)'1/4
is given for n = 50, 100, 150, 200, and 250 in Figure 6.
The response in this case is seen to be both ampli-
tude and frequency modulated, the "instantaneous frequency" in
the sense of frequency modulation theory being given by
u ' m ^ (2(nt)1/2) « A
while the envelope of the amplitude modulation is approximately
exponential. In particular, the type of behavior found here
can be considered satisfactory since there is no tendency for
the magnitude of the largest overshoot to increase without limit
as the number of repeaters is increased. As will be shown
later, this type of behavior is typical of any network having
a high-pass characteristic in the generalized sense of that term
as it has been defined above.
In MM-40-3500-92 dated 10/14/1940, J. G. Kreer and
J. H. Bollman concluded that the appropriate y(p) for a self-
regulating repeater employing a directly heated thermistor
element in the control device was given by
It should be observed that for o 4= 0 this transfer
ratio does possess static error. L. A. MacColl in MM-40-130-270
treated this case for Id < 1 and found that the system exhibited
essentially the same type of satisfactory behavior as that
discussed above.
- 8 -
(2) A slightly more complicated example is given by
y(p) = P<P + °]
(p + D2 * '
It is easily seen that for a < vTT, I y( iu>) I is a high-pass
jharacteristic in that I y( ico) | < 1 for all finite to and
y( io>) I — > 1 as co — > oo . On the other hand, if ft > -/IT,
y(io))| possesses a maximum greater than 1 at some finite
frequency. ly(ito)[ is illustrated by curve I in Figure 7 for
a = 1.4 (high-pass) and by Figure 8 for c = 2 (band-pass).
The response Vn(t) to a unit step function is shown in Figures
9 and 10 for these two cases with n = 1,2 9. The character
of the response is seen to be of a radically different kind
for these two values of a.
For a = 1.4 the response is seen to be of the same
type as that encountered in the first example. For a = 2, on
the other hand, it seems to represent an oscillation in which
the magnitude of the largest overshoot is increasing without
limit as n tends to infinity. Later it will be shown that
this is in fact the case and that satisfactory operation is
impossible for a large number of repeaters in this case.
From this and other considerations L. A. MacColl
conjectured that a necessary and sufficient condition that
the response V (t) be bounded for all n was that the transfer
ration y(p) have no net gain at any frequency. Mathematically
expressed, a necessary and sufficient condition that
I Vn(t) I < M for all n,
where M is independent of n and t, is that
(M) I y( ito) I < 1 for all real frequencies to.
Physically, the condition on y(ito) prevents the transfer ratio
]y(ito)|n for a system using n units from having a tremendous
gain at any particular frequency.
This case was also treated by L. A. MacColl, but no memorandum
on it was ever written.
In one sense this memorandum could be summarized as
a proof of this conjecture. In particular, a direct proof of
the necessity of MacColl's condition (M) is given in the second
part. The remainder of that part is devoted to an indirect
proof of the sufficiency. The argument consists in exhibiting
the two types of possible responses; the first being that
associated with a y(p) satisfying MacColl's condition and that
second that resulting from a y(p) which violates it at one or
more frequencies.
Statement of Results
The detailed results of the sufficiency argument
are discussed conveniently in terms of the generalized
characterization of high-, band-, and low pass y(p)'s as
given on page 8, The results will be taken up in that order.
High Pass
In terms of the above classification, the class of
high pass y(p) 's consists of just those functions which satisfy
MacColl's condition and are therefore those from which a satis-
factory response could be expected. For the y(p)fs in this
class, it is clear on physical grounds that the maximum contri-
bution to the response V (t) of equation (1) will come from the
large values of |w| since for these values of I u| , |y( io))|n > 1
while for all other values of I co| , I y( iu>) I n — > 0. Using the
first three terms of the Laurent expansion of y| iu>| about u = oo ,
one finds:
(5)* y(iu)) = 1 + S_i + \ ,
(6) ly(iu)l ~
, a2 + 2b
1 + — s —
0)
1/2
to
(7) Angle y (iuj Sf.g .
* It is assumed that a > 0, b < 0, and that 2b + a <,0. These
assumptions correspond to a second order maxima at I u)l == oo and
to a monotonic decreasing phase function for y(p) as I oo] — > oo .
- 10 -
If these approximations, which are valid for I to| sufficiently
large, are introduced into equation (1), it can be shown that
the principal contribution to V (t).for a unit step input is
given by:
Vn(t) * (n)-1^ (nat)-lA exp | jfi!j±-^>tj cos (EvHSt
This, with a suitable interpretation of the constants
a and b is seen to be of the same general form as the response
obtained by liacColl for y(p) = p/(p + 1) as given by equation (
Just as in that example the response is both frequency and ampli
tude modulated. The instantaneous frequency of oscillation is
again given by
•
The gain for
y(p) = P(P i
(P I D2
is shown on curve I of Figure 11. Curve II of this figure
represents ly(iw)|100 for this y (p'). For this example and
n = 100, the true gain |y(iu)|100 ana the gain approximation
resulting from equation (6) are indistinguishable on the scale
of Figure 11.
The corresponding phase characteristic for y(p)100
is plotted on Figure 12 where, for reasons which will appear
in Part II, the actual frequency has been replaced by
w» = ^_ .
-✓n
Again, on the scale of Figure 12 the actual phase is indis-
tinguishable from the approximation resulting from equation (7).
Figs. 7 and 13 present the same information for
y(p) =2l£_^il
(p + ir
and n = 100.
- 11 -
Again the agreement between the actual phase and the approxi-
mation is excellent. However, there is a considerable error
in the gain approximation for small I <d| ► This large error is
unquestionably due to the fact that the value o = 1.4 is near
the critical value a = ST at which the characteristic changes
from high-pass to band-pass.
Agreement with the above asymptotic formula can of
course be obtained by increasing n sufficiently. Alternately,
for n = 100, a better approximation to the gain can be obtained
by writing
y( iu) = 1 +
a i
.0)
b
~2 +
CO
and
ly(iu)l =
l +
2b + a
2d + b + 2ac
CO'
' I/2
This approximation leads to a curve which is indistinguishable
from that of FyU^)!100 in Figure 7. With this approximation,
one finds the following expression for VQ(t) when the input
is a unit step function
*
V (t) * (nj^Cnat)-1/4 cos (2^nat JL ) exp((a^2bU)
( (2d + b2 + 2ac)t2 )
i1 + 2^ ■!
( )
This expression is seen to approach that given by equation (8)
as n > co . Thus one can conclude that the response will
always be satisfactory if' y(p) belongs to the class of high-pass
characteristics .
Band-Pass Case
MacColl»s condition is clearly violated whenever ly(iu))|
has one or more relative maxima greater than 1 at finite fre-
quencies. For simplicity the case where |y(iw)l has only one suet
12 -
maxima at u = to0 will be treated first. It will furthermore be
assumed that this maximum is of the second order; i.e.
d2
dw2
^ 0.
Under these conditions, it is physically clear that the maximum
contribution to the response V (t) as given by equation (1) will
be due to those frequencies near o>o, at which I y( iu>) I possesses
its maximum, since as n increases ihis region becomes increasing
more important than all the rest. It is also clear that the time
of maximum response will be given by the delay time experienced
by the frequency wQ in passing thru the network. This is known
to be given by. tQ = - n B'(w0) where Bf(u0) denotes the slope of
the phase characteristic B(u>) in the expression
(10)
y( iw) = A(uj) exp ( iB(u) ) .
If A(to) and B(u>) are expanded in a Taylor's series about u> = coq
and terms up to the second order retained, it can be shown that
the response to a unit impulse function is given by
(ii) vn(t) = A(^Jn
VZn
G(u0) exp (
-(t-to)cH(0)n)
o/ ) cos |u>Qt + nB(uQ)
where
0(»0) - n-V8j
(
( —
A"("0)
-1/4
* CB»»(w0)n
H(«0)
A' '(cup)
(I A"l«Q)
2>
> 0
- 13 -
(B"(w0) A{« J)
io((,o) = arctanj 2a,,([Uq) )
)
tQ = -nB(wQ) .
Thus V (t) can be interpreted as an amplitude modulated
n
wave with an envelope proportional to the Gauss error curve
(-(t-tj2 )
e*Pj 2n H^o)j
with a standard deviation given by
(
(
( n
(
(
(A
)2
- )l/2
(B"(U)Q))2
J )
The standard deviation cr is of course a convenient measure of the
duration of the disturbance. The maximum response occurs for time
t = - n B' (« ) at which time the amplitude is proportional to
A("0)n
. ✓IE
Thus if A(w ) >1, the maximum response will represent a value
which is very large compared with unity, the magnitude of the
original disturbance, if n is large. This would force any system
involving vacuum tubes to overload if n were sufficiently large.
These properties are summarized in Figures (14) and
(15). Figure (14) is a plot of the response for values of t
near t for a few values of n for the example given by equation
(4) where a = 2. Figure (15) is a plot of the maximum response
for a few values of n for different values of the parameter a.
It should be remarked that the above approximation to
the gain which was obtained by keeping only the first two terms
- 14 -
of the expansion of A(w) about go = u)Q could only be expected to
be a reasonable one for fairly large values of n, since it
represents a usually unsymmetric gain characteristic by a
symmetric function. A better or second approximation can be
obtained by using three terms of the Taylor's expansion instead
of two. Just as in the high pass case, the retention of this
extra term gives rise to a second term in the expression for
Vn(t) but it does not fundamentally alter the characteristics
of the response since the correction term vanishes for t = t ,
at which time the response is still a maximum, with the same
amplitude as before. Its only effect is to take cognizance of
the unsymmetrical character of the gain characteristic A(w) and
to change the resulting response envelope to an unsymmetrical
one. Of course, it also modifies the phase of the oscillation
inside the envelope in a complicated way without changing the
fundamental frequency of oscillation. •
•
For these reasons and because of the complexity of the
resulting expression, it will not be written down here explicitly
although the explicit approximation to the gain A(w) will be
discussed in Part II.
The two approximations to the gain are illustrated for
equation (4) with a = 2 in Figure 16 for n = 100, In this case
. . |u)|-/)2 + 4
A(u) = 5 •
(iT + 1
As can be seen from the figure, the second approximation does in
fact represent A(w) over the significant range of frequencies
near -w from which it can be concluded that the response will be
unsatisfactory. Figure (14) r previously referred to, furnishes
a picture of the envelope response as obtained from the first
approximation.
In the event that A(^) takes on its maximum value at
more than one place in the finite frequency range, it is clear
that the above results can be generalized as follows:
Let V . (t) be the response of the form given by equation
(11) due to a maximum at co = w- , Let the time of maximuma response
- 15
from this maximum be denoted by t. = -nB*(wj_)» Then the total
response is clearly given by the expression
k
vn(t) = Z V .(t).,
n i=1 ni
if there are k relative maxima* Unless the values of A(w) at
the points u) = are nearly the same, it is also clear that
only those terms of the above sum which correspond to the largest
maxima of A(w) will be of significance. .
The band-pass case is also discussed briefly for unit
step inputs in Part II.
Low Pass Case
Since the low-pass case differs from the band pass case
only in that A(w) has its maximum for w = 0 instead of at u = uQ
^ 0 the results of the two are very similar. The results in
the low-pass case are simpler because it will be recalled that
B(w) (as defined by equation 10) is an odd function of 10 for any
physical network, This forces both B(0) and B'^(0) to be zero so
that for an impulsive input one obtains the simple formula;
(12) j It) Vim In"3/2
n -/2n (
A"(0)
-1/2) (t-tQ)2 A(0))
J exp [ 2n A'* (0)j
This result corresponds to the well-known formula from
transmission line theory for non-distortionless lines.
Remarks
From the practical viewpoint the above results have the
following implications for communications systems such as a
cross-country coaxial telephone system employing self-regulation
repeaters spaced at intervals of a few miles.
(1) If the transfer characteristic of each individual
network is of the high-pass type (in the sense in which this term
has been used above) then the transient response will never exceed
the initial value of the disturbing input voltage and it will
be damped out so that the operation of the communication system
would generally be considered satisfactory.
- 16
(2) If the network is not of the high-pass type, the
usual practical case, and there is any net gain in the system,
which is peaked at u>0 then for even a small number of units the
response will exceed the initial input at the time given by
tQ = - nB'(u>0)
where
A'(u)0) = 0
and if the number of units is sufficiently large the output
from the n-th unit will be large enough to cause severe over-
loading.
At first glance these implications are not promising
and seem to indicate that the operation of a cross-country
system involving several hundred repeaters and regulators would
be extremely difficult, since , the only satisfactory characteristic
is difficult to attain in practice. However, "practically the
ideal characteristic which is high pass can be approached in the
sense that the peaked frequency can be made very large. Thus
the maximum response may occur so soon after the initial distur-
bance that the physical system would not be able to follow it or
to distinguish it from the initial disturbance which in many
cases would be large enough to cause momentary overloading of the
system.
Moreover, it is ah experimental fact that in the design
of feedback regulator characteristic forcing the peaked frequency
higher reduces the size of the- peak which in turn will permit the
use of a larger number of regulators in the system.
If this is done, the time of maximum response, tQ =
nB'(^0), will be small since B'(a)) in general is small for large
u). Assuming that the effects of the maximum response have been
treated in this way, it is natural to inquire into the type of
response which will result for finite values of t > tQ.
If one examines the gain characteristic curve of the
type shown in Figure (7), it is clear that for frequencies less
than some frequency u>, slightly less than the peak frequency u>0,
- 17 -
the shape is fundamentally like that of the high-pass case.
Remembering that the phase delay of a frequency through a linear
network is given by the slope of phase characteristic at that
frequency, it is clear that the response for values of t greater
than tQ, the time of maximum response, will come from the fre-
quencies less than uQ, since the phase slope characteristic is
large for small frequencies and small for large frequencies.
Now if it is assumed that the phase characteristic nB(u>) is a
monotonic decreasing function of to, it is clear that the 'function
(nB(w) + tot) will always be stationary at an arbitrary frequency
u>, provided that t is given a suitable corresponding value. Thus,
it is reasonable to expect that the response for t » tQ* will
exhibit the same type of character as that obtained in the high-
pass case discussed above. This, it will be recalled, is both
frequency and amplitude modulated with an envelope which decreases
approximately exponentially. Thus, under these circumstances it
seems reasonable to supoose thet satisfactory operation of the
communication link could be obtained.
To recapitulate, the most practical design for any
system of the type envisaged in Figure 1, from the viewpoint of
satisfactory transient response involves approaching the high-
pass characteristic as closely as possible by making the gain
characteristic of the transfer ratio peak at as high a frequency
as is practicable and by keeping the phase slope characteristic
monotonic for all smaller frequencies.
PART II
Mathematical Discussion
Theorem I. A necessary condition that the response Vn(t) from a
chain of n-four terminal linear invariable networks sub.ject to~a"
unit step input function have a common finite bound for all n is
that the transfer ratio y(p) satisfy the relation-
(M) |y(iu))|< 1 for all real values of w.
* A different type of expansion, valid for any fixed t or n — > co
is discussed at the end of Part II.
By
- 18 -
Proof: By hypothesis
Iv (t)|< M for all n where M is independent of n and t
n ■
,00
so that
Vn(p) = J e"pt Vn(t) dt
n VP)
y(p)n . , pVn(p)
ly(p)ln - ipl|f° e~pt vn(t) dt|
lvn(t)l dt
< I pi M J I
If p = c + iw and if c > 0, then
' 2 'c
C + Od
M
so that
log (y^kllog^V/
Thus, in the limit as n — od , it follows that for any
p with a positive real part
log I y(p) !< 0
- 19 -
and hence
ty(p}]< i
Since this relation holds everywhere in the right-hand half
plane, it follows from simple continuity considerations that
the maximum of ly(iw)|, never exceeds 1, Thus
ly(iw)l < l
as was to be shown.
The remaining discussion will be devoted to the
characterization of the different types of possible responses
and will, at the same time, furnish an indirect proof of the
fact that the condition (M) on y(p) is also sufficient.
High Pass Case - Unit Step Input
If the networks comprising the system shown in
Figure 1 possess a transfer ratio having a high pass ^ gain char-
acteristic in the sense defined above, and if one writes ,
y(iu>) = A(u) eiB(u))
then the gain function A(«) satisfies the two conditions
(A) A(w) < 1 for all finite frequencies u».
(B) Lim A(w) = 1
to •-* 00
Under these conditions it is clear that, for sufficiently large n,
the main contributions to Vn(t) will be due to the high values of
I u)| . For convenience, . Vn(t) is written here in slightly dif-
ferent form
Vn(t, -He \l fA(.,»eW«'-' -^
("J0 )
- 20 -
For large values of I w| , all physical transfer ratios y(ito)
of interest to us here can be represented by an expansion
of the form*
M„v , . , . ( , ai b ci d )
•
We. shall confine our attention to the ordinary case, in which
a > 0, b < 0 and 2b + a2 < 0. For large values of f col , we now
have
1/2
(14) A(u) = S[l + \ + 4 + ...T2 + C§ + -% + ---l2!
V GO U) to '
a c
(15) B(u)) = arctan u)
— + —75- + • • •
, b d
1 + ~2 + ~4 +
It is clear that, for I oo| sufficiently large, the
leading terms of these expressions will furnish adequate approxi-
mations to A(u) and B(w). These are:
2 9. 1/2
(16) A(w) = [1 + a +z 2b]
(IV) B(u)) = § .
Let uQ be the frequency defined by the condition that
these approximation are accurate to within the arbitrarily chosen
permissible error e for values of go such that w>wq. Then we
can write
* In the usual case y(p) is a rational function, so that this
expansion can be readily obtained.
- 21 -
( „co . r _ , , . n n doj
Vn(t) = ± Re Jo° A(co)n eirnB(u))+ut^] -
O) CO
o
=-±Re (Ix + I2).
It is clear that
II I < fo iam£ dw-
1 ~J0 I col ■
Since fA(w) Jn — 0 for each co in the finite range 0 < to < u ,
it is clear that 1 I -J can be made negligibly small by taking
n sufficiently large. Introducing the new variable v defined
by the relation
v = CO
J
na
■
I2 can be written as
r00
1 +
(a + 2b )t
nav
V
Letting
(a + 2b)t
av2
- 22 -
and using the binominal expansion, one has
Ca* + 2b) t
2
nav
n/2 —
1 +
n/2
1 + f +
| (§ - 1)
(41)'
1 + J + 1/2 (1 - ^) (X) +
e^2 + terms in l/n.
Thus, for sufficiently large n, I2 becomes, approximately
e
2
(a + 2b)t
2av
e
Vnat (- + v) dv
In this form the principle of stationary phase can be applied to
I2 (Cf. Appendix I); for the amplitude factor
(a2 + 2b)t
2av2
e
v
is independent of n,. while the phase function (in the notation
of the appendix)
¥(v) « + v)
is monotonic in the range of integration on each side of the
stationary point (v = 1) where
tp'(v) = 0
- 23 -
Physically speaking the form of equation (18) suggest
the interpretation of Vn(t) as the sum of an infinite number of
complex waves whose amplitudes are slowly varying function of v
and whose complex phases are rapidly varying functions of v.
Under this interpretation it is physically reasonable to expeot
that wave interference will occur everywhere except near v = 1
where the phase function given by equation (19) is stationary.
This is the principal of stationary phase. It remains to
evaluate the principal contribution to Ig for values of v near 1.
Replacing y (v) by the first three terms of its Taylor*s series
about v = 1,
q>(v) = cp(l) + 0 + - 1) = 2 ♦ (v -l)2
the main contribution to Ig is given by
r>l+Tl
1 * eir2vnat - |]
1-n
e 2av2 iVnat (v - l)2 dv,
e
In the interval (1 - r\f 1 + r\) t the amplitude factor
i exp T(a2 + 2b)t/2av2]
is substantially constant and may be removed from under the
integral sign and evaluated at v = 1. By the reasoning of
Appendix I, the contributions to the remaining integral are
not appreciably affected if the limits are changed to (-co, oo )
respectively. Letting
I * v - 1
we can then write 10 in the form
I ~ exp j(a2 exp fi 2v€St - 1 §3 f°° eiVMt « d£
( ) -CD
- 24 -
By the known properties of Fresnel integrals
—00
and hence
Taking the real part and dividing by n, the asymptotic expression
for Vn(t) is therefore given by:
(20) Vn(t) = n'V2 (nat)-1^ exp ( ( ag +2b)t) cos {Z/m _ n,
which is equation (8) of Part I.
A more accurate approximation to the gain A(w)n is
given by
if,.i n 2b ♦ a2 2d + bf_j_2ac-.l/2
A(w) = [1 + * + t J
where the first three terms of equation (13) have been retained.
From this it follows that:
m.a* ~ n (/2b + a2 2d + b2 + 2ac?
A(w) = exp -J- ( § + t J
exp [n (2b . a2)] exp j| (2d+b2+2ac)|
(* ^ ) (2 ^ )
from which it follows that the second approximation is obtained by
multiplying the first by the factor
exp (p
r
jn (2d + b2 + 2ac)
If the frequency transformation v =
7?
is now made
the first factor will as before be independent of n. Over the
range of integration where the integral is significant their
product can be removed from under the integral sign giving
V (t) = (n)"1/2 (nat)*"1/4 cos (2Vnat -
exp
(a2 * 2b)t
2a _
exp
(2d + b2 + 2ac)t2
P
2a2 n
% (u)"1/Z (nat)"1/4 cos (2vnat - $)
e
(a + 2b )t
2a
, (2d + b2 + 2ac)t2
1 + J 5 1 * •••
2,eT n _J
which is the equation (9) of Part I.
Band Pass Case - Impulsive- Input
For simplicity let it be assumed that the gain charac-
teristic A(u) has only one absolute maximum at u> = wQ on the
positive frequency range and that this is a second order maximum.
- 26 -
The response Vn(t) can always be written in the form
(co )
A(wo,n f n log H^-r inB(u) + iut )
Vn(t) = — Re Jo en l0* TU^f ♦ dw).
In this form, Vn(t) can again be interpreted as being proportional
to the sum of an infinite number of complex waves of amplitude
with varying complex phase* given by
cp(w,t,n) «= nB(o)) + wt.
With this interpretation it is clear that the maximum contri-
bution to Vn(t)^will be given by those frequencies- in the
neighborhood of u> , where uQ satisfies Ar(w) = 0 and at values
of the time t near t at which the phase function, <p(u>,t,n)
is stationary for the maximum frequency i»Q. Thus tQ is given
by
t0 = .nBM«0).
Since
A(w0) ^ 0 and A«(wpj = 0
♦"Phase" as used here differs from the way it is normally used
in engineering.
27 -
one can write for a suitable small neighbothood of wQ
If we retain only the first term of this expansion, then for a
suitably restricted neighborhood of wQt one has
n
e
n log A(uQ
"TEC
A(u>0)
nA"(u>o) (u _ u ,.:
Similarly, for u sufficiently near o)Q
Bw(co0) 2
(23) B(o>) = B(coQ) + B»(w0)(" ~ «0)'* — g <w " V *
Henceforth for simplicity, we shall write
A = A(co0), A" = A"(wo), B = B(w0), B» = B»(«0),
B" = Bw(cjq)
If these approximations are valid in the neighborhood,
(uQ - A, wQ + A it follows that
vn(t)
(
iRe (
f
A(u>)n e^nB(w) + Wt:d(,
Wo+A_J
♦ A
u)Q+A
u>o-A
exp
nAn
(W - a) )2 + i[nB + nB» (w - (DQ)
- 28
Since [A(u>)]n — 0 as n — oo , except near u = wq, it follows as
before that the sum of the bracketed integrals can be made
negligibly small in comparison with the remaining one if n is
taken sufficiently large. Recalling that
t = -nB'CO
o o
the remaining integral can be written as
Tn(t) = | Re Un e1^ ♦ -tl
,u)o+A r „
exp M11 (w "^o1 + i(t -to)(a) -°)o)
inB"
)
dw)
)
Again the finite limits of integration can be replaced by - go
and oo since » for large n,
I*- (--.-„)'
e
will be small except in the immediate neighborhood of u .
If one sets
p . -n (£ * oB") .
p2 = i2(w - wo) ; g - t tQ
then the remaining integral can be recognized as pair No. 710.0
of the Campbell and Foster Tables.
Then one finds
Vn(t) = —372" Re {{ An expCinB+io)0t3 exp [-(t-tQ)2]
2n°/& (
( VP
4p
The result is equivalent to that given by equation (11)
of part I. If A(cjQ) is greater than 1, it is thus seen that the
response will have a maximum value that builds up very rapidly
as n increases and would eventually force any system involving
vacuum tubes to overload.
It should be remarked that the above approximation
to the gain could only be expected to be a reasonable one for
fairly large values of n, since it represents a usually un-
symmetric gain characteristic by a symmetric function. A better
or second approximation can be obtained by keeping the second
term of the expansion of the logarithm in (21), and then tak-
ing the first term of the expansion of
(U) - 0) )' .
e
This yields
The addition of the second term in the above ex-
pression gives rise to an additional term in Vn(t), provided
that the same phase approximation (23) is retained. The
resulting V (t) is similar to (11) but the new envelope con-
sists of the old envelope plus nA"/6A times the third deriva-
tive of the old envelope. The modulated frequency remains
the same but the phase is changed in a complicated manner.
(Compare- pair 710.3 of the Campbell and Foster tables).
Unit Step Input
In this case one can write
Vn(t) = - Re
oo
i[nB/u) + g]
(I)
As before the only significant frequencies are in the neighbor-
hood of a) = to and near this point the 1_ in the denominator
can be taken out of the integral as l/w" provided u>Q i 0. Thus
the result will be same as for the impulsive input apart from
the factor l/wQ if one makes nB(u>) - n/2 correspond to nB(u>)
in (11).
Low-Pass Case
It is clear that the analysis for this case in which
the equation A'(") = 0 is satisfied for w = 0 can be carried
through in exactly the same manner as the band-pass case treated
previously. The resulting answer is capable of simplification,
however, if it is recalled that B(w) for any physical network
is an odd function of This forces both B(0) and B,f(0) to
be zero. The resulting formulae then become
a) Impulsive Input
b) Unit Step Input
(24)
A(0)n e W A(0)
2n A"(Cfr
Tt
3/2
vn(t)
A(o)
n
3/2 /2nA' Ha)
n J A(Gj
,t
(-(t-tQ)2A(o))
exp j 2nA"(») jdt'
31 -
This last expression involves an integral since it
is necessary to eliminate the pole at zero where A(w) has its
maximum. This can be done by differentiating Vn(t) with res-
pect to t, finding the aysmptotic formula for V^(t) as before
and then integrating to obtain (24) •
Hamy*s Expansions in the Band-Pass Case
The type of asymptotic expansions so far given for
the band-pass case were explicitly designed to represent Vn(t)
in the neighborhood of t = t where Vn(t) is a maximum. They
could in no sense be considered the true asymptotic expansions
for values t« t or-t» t . In particular their derivation
o o
depended upon the fact that the 'time of maximum response was
related to the number of four terminal networks by means of
the equation
t0=-nB'(wo),
so that as n — oo , tQ — oo .
Other types of expansion are clearly possible.
Two obvious alternatives are:
(1) Those valid for fixed n as t — oo ;
(2) Those valid for fixed t as n co .
The first of these will not be considered here since
they are of little interest as all of the four terminal networks -
have been assumed to be absolutely stable. The interested reader
is referred to the book by Doetsch on Laplace Transformations
for expansions of this type.
Since the second type of expansion is of interest
here and is not to be found in most of the standard reference
works it will be discussed here briefly.
In a classic paper, M. Hamy* derived general ex-
pansions of this type for complex integrals of the form
J f(z) <pn(z)dz
♦journal de Mathematique, vol. 4, 6th series, 1908, page 203.
under a variety of hypotheses on f(z) and <p( z) . These condi-
tions include the case where qr(z) has a saddle point given
by the solution of tp*(z) =0 and the result of this case is a
generalization of the often-used theorem of Fowler which one
finds in his book on statistical mechanics under the title of
the saddle point method.
More to the point, they also include the case
where cp(z) has one or more maxima on the path of integration
at which <p*(z) =0 provided that f(z) admits a Taylor series
expansion about these points. In particular, then, if one
considers t as a fixed parameter 'they apply to the integral
of equation (1), with c = 0 and <p( z) = y(p); f(z) = ePtvQ(p).
In terms of our notation, one finds that:
(a) for an impulsive input with gain maxima at <*) = wQ
2An(cO x
VtJ ~ nB'(a>°) COS rV + n B(u,o):i + term in ^ *
(b) for a unit step input with gain maxima at w = uQ f 0.
2An(w ) ,
Vn(t) ?a COS [V + nB^o]^ + termS in — '
■ v o' o n
It is interesting to note that these formula indicate
a dependence upon 1/n instead of 1/Vn as in the case of the
previous expansion. These formulae can be thought of as repre-
senting the response in the band-pass case for any fixed t,
t« tQ.
33
Appendix I
■
Certain remarks of Aueral Winter* on the justification
of the principle of stationary phase are pertinent enough to
the above discussion to bear repetition here. In order for the
integral
(25)
f(x) e^(x,dx
to be asumptotically represented as p — oo , by the formula
(Cf. Lamb, Hydrodynamics p 395)
(26) a ^J^ToT . e irP9(a)±inJ
. y|pltp"(a)l
where cp'(a) ■ 0 and where the upper or lower sign is to be
taken according as <p"(a) is positive or negative, it is
evident that two things are sufficient.
(1) The contribution to the integral outside a small interval
around the stationary value a of <p(a) must decrease more
rapidly as a function of p than the one obtained in the
neighborhood of a;
(2) The asymptotic formula given above must adequately re-
present the behavior of the contribution to the integral
from the neighborhood of. the stationary value a.
Now, if, on any closed interval I, <p*(x) is continuous
and has no zeros, and if <p(x) is strictly monotone in this inter-
val, then z = <p(x) can be introduced as a variable of integration
on that interval, transforming S into
* Method of Stationary Phase Journal of Math. & Physics,
vol 24, no 3-4 - 1945
- 34 -
f(x) e^(x) dx
f [^(zJJ eipz dz
If, in addition to the above, <p(x) and tpf,(x) are continuous
and if f(x) and'f'(x) exist and are continuous, this last
integral can be integrated by parts, giving
S =
| fr^uneip2j
Ip
{
)
1
ip
e±PZ A fCT_i(z)]dz
-1,
and showing that on any such interval I,
S=0(I).
Thus, condition (1) will be satisfied if, in the
neighborhood of the stationary
the integral is greater than
point
o(I).
a, the contribution to
This is clearly the case when the asymptotic formula
(26) is valid, since there the dependences on p is as 1/vp.
it can be shown that (26) is valid whenever
-1
tp(ct) = 0, <ptf(a) f 0 and <p« • (x) and f|>
are of bounded variation in the neighborhood of the stationary
value. Thus, to recapitulate, under these conditions, the
maximum contribution comes from the stationary point and depends
on p as l/vpt while the points which are not near the stationary
point contribute terms depending upon p only as l/p ,
To conclude this brief appendix, it should be remarked
that Winter gives an extension of (10) which is valid under
the same condition of f[tp~l(z)] if the first n derivatives of
<p(x) vanish at some point a while cpn+1(x) does not. These results
could be used to extend the treatment of the high-pass case
given above to the cases in whion a2 + 2b = 0, etc.
C. L. DOLPH
C. E. SHANNON
Att.
B-392415 to 392428
FIG. 3
8A-392.4-I5
*
<\1
ol
t
<0
'— (OOI=U)%
— (0S=U),1.
125 db-
loodbjo*
-
1
•
-
•
[am]
:
■
ST
1 APPRC
)x.-y /
5 •
• \
\\
\ * \ \
—f
/
1
\ \2
!NDAPPR0X.
\
VVTAPPR0X.
J
*
[AU»]
*
/ 1
>-»
T APPRO
X.
\
f
FIG. 16
"» A
Electronic Methods in Telephone Switching
C. E. Shannon
In the recent development of electronic digital computing machines various new
tubes and other electronic devices have been designed which may be of use in
machine switching. In particular the "selectron" tube developed by R. C. A. and the
mercury acoustic delay tank provide large cheap memory devices in which information
can be registered or read off in electronic time intervals (of the order of
microseconds). Since one of the chief functions of the relays and switches in a
telephone exchange is that of memory (e.g. the relays remember which calling and
called lines should be connected together) it is worth while considering the possibility
of using such tubes to replace ordinary electro-mechanical switching equipment.
Suppose we have an exchange (or set of exchanges) serving n subscribers and that
the exchange can handle a peak load of m simultaneous conversations. These may be
between any m pairs of the subscribers. Thus the exchange must be capable of
assuming as many different states as there are of selecting m pairs of objects from n .
This can be done in
n\
ml 2m(n - 2m)!
different ways. For n and m large the logarithm of this is approximately 2m log n .
If the logarithm is to the base ten then this is the required memory capacity of the
exchange measured in decimal digits. If the logarithmic base is two the units are
binary digits. A single two-position relay has a capacity of log 2 units (one binary
digit or .30103 decimal digits), while 5 relays have S log 2 units. A 10 x 10 crossbar
switch has a capacity of 10 log 10, while a single commutator on a panel has capacity
log r , where r is the number of vertical positions of the brushes. Hence the number
of relays required for a pure relay exchange would be
2m log n
log 2 '
the number of 10 x 10 crossbars would be
2m log n
10 log 10 '
etc. To these estimates must be added the losses due to inefficient use of the memory
and also the memory of equipment used for functions other than merely remembering
which connections are being held at a given time.
An ordinary relay is capable of remembering (by a holding circuit) one binary
digit. A pair of vacuum tubes in a flip-flop circuit has the same memory capacity.
The cost of these is of comparable magnitude, and thus if one designed an electronic
telephone exchange by merely changing relays to equivalent vacuum tube circuits the
chief advantage of the electronic circuit would be one of speed, an improvement of
order 103. In many cases this could produce a reduction of cost since frequently many
identical units of a certain type must be supplied because the individual units are slow.
This is apt to be the case with units which are associated with the beginning or end of
calls but need not be used during the conversation. On the other hand equipment to
be used throughout the call would offer less advantage under this tube for relay
replacement since the expected duration of calls is long compared to electronic times.
The newer electronic memory devices, however, change this picture considerably.
A selectron tube (when these tubes are in production) may be expected to cost $100 or
less depending on the demand. It is capable of holding 4096 binary digits, giving a
cost per binary digit of the order of 2.5 cents, while the cost of the equivalent relay
may be of the order of 2.5 dollars. Mercury delay lines can store information at a
comparable cost. Thus it is not impossible that a reduction of the order 100 to 1 in
switching equipment cost might be possible by the use of electronic devices, even in
the parts where information must be stored for long periods of time.
An indication of how such tubes may be used is given in the attached figure.
Fig. 1 is a block diagram of a simplified exchange. The calling parties are connected
to an electronic commutator which samples the speech signals periodically and puts
the various lines in the time division multiplex. The called parties are also connected
in time division multiplex to a single channel by means of an electronic commutator
or distributor. The function of the middle part is to rearrange the samples in such a
way as to provide any desired interconnection between calling and called parties. This
is done by dividing the sampling period into two equal parts. During the first half the
signal plate of the upper selectron is connected by gate 1 into the calling line
multiplex channel. Its windows are caused to open in sequence. Thus at the end of
the first half-cycle the first samples of all the incoming channels have been written on
the face of the tube in their regular order. During the second half-cycle gates 1 and 3
are closed and gates 2 and 4 are opened. Thus the output of the selectron is fed into
the called line multiplex and the windows of the selectron are controlled by the other
selectron tube 2. This tube has registered in a suitable notation the numbers of the
called line desired by the calling line. The windows of this tube are opened
sequentially by the cycling unit and the numbers registered there control the windows
on tube 1 allowing the sample from calling channel 1 to go into the proper place in
the called line TDM.
By a more elaborate system it is possible to make use of the fact that only a small
fraction of the lines will be busy at a given time, as is done in ordinary relay
switching. This can be achieved by only supplying enough places in the distributors
for the peak load. When a call originates the calling and called parties are assigned
idle spaces in the distributor. The place assigned to the called party is registered in
the selectron register corresponding to the place assigned to the calling party.
Some Generalizations of the Sampling Theorem
We have seen that a function of time f(t) containing
no frequencies over W cycles per second can be described by-
giving its value at Nyquist intervals (spaced ^ seconds apart).
It can be reconstructed from these samples using the basic
functions sin 2nWt/2nWt , together with the same function shifted
by integer numbers of Nyquist intervals. We now consider some
generalizations of this result.
In the first place the particular function
sin 2nWt/2nWt is by no means necessary for the reconstruction.
In fact any function cp(t) which contains all frequencies up to
W is satisfactory. More precisely the spectrum of cp(t) should
not vanish over any finite set of frequencies (set of positive
measure) up to W. If <p(t) satisfies this condition the original
function f (t) can be reconstructed using cp(t) and its shifted
images <p(t + ~) . That is coefficients a£ can be found such
that
°° K
f (t) = 2 aK q>(t + f») .
j[ — _ 00 *»• *w
In general the coefficients are not found as easily as in the
special case where cp(t) = sin 2nWt/2nWt (when they are merely
the values of f (t) at the Nyquist points) but they may be
calculated as follows. Let F(w) be the spectrum of f (t) and
$((0) be the spectrum of cp(t). Expand the function F((d)/$(co) in
a Fourier series using -W to 4W as the fundamental interval.
- 2 -
Thus
.ko)
F(cj) _ T _ _ 2W
ft(u) ~ L SK 6
°r £&
F(w) = Z aK 0>(oj) e 2W .
Taking the transform of the equation we obtain the desired
expansion
f(t) = 2 aK cp(t + !y) .
The coefficients in the expansion can therefore be determined as
the coefficient of a Fourier series expansion of F(w)/<I>(<d) . In
general the function cp(t + ^) will not form an orthogonal set
and therefore the energy in f(t) cannot be found from 2 aK as it
was in the simple case where «p(t) = sin 2nWt/2nWt.
A physical method of performing this expansion can
also be given. Consider a filter which gives the output
sin 2nWt/2nWt when the input is <p(t) . If the function f(t) is
passed through this filter the amplitudes of the output at
Nyquist intervals will be the desired coefficients. This is
true since this output can be considered as expanded in the
f mictions sin 2TrWt/2rrWt with the amplitudes as coefficients,
and the inverse filter would restore the original function and
change each of these functions with cp(t) at the corresponding
Nyquist point.
A function f (t) can also be determined from a knowledge
of its value and derivative at alternate Nyquist points:
We have here the same number of measurements per second, 2W,
but half of these are ordinates of f(t) and half are derivatives.
The reconstruction of f(t) from these values can be carried out
simply using two basic functions:
_ ( + x _ sin2 nWt
Tllt) '"wmT
m x . sin2 rrWt
*2{t) ~ (nWt) *
Both of these lie entirely within the band W and has the
property that it and its first derivative vanish at alternate
Nyquist points (except for t =0 where the function is 1 and
its first derivative 0) . Likewise cp2 and cp£ vanish at alternate
Nyquist points except at t = 0 where cp2 = 0 and (p2 = 1. Thus
we can fit the ordinates of the original function f (t) using ^
and its shifted images (shifted by two Nyquist intervals). The
derivaties of f(t) are fitted using cp2 and its shifted images.
Due to the vanishing of these functions none of the fittings
interfere. The function constructed by this process must lie
within the band and have the same values and derivatives as the
original function f (t) at alternate Nyquist points. That there
is only one such function can be shown by arguments similar to
those used in the basic sampling theorem, generalized by break-
ing down the spectrum into an even and an odd part.
- 4 -
It is possible to carry this further and determine a
function from knowledge of its value and first (n - 1)
derivative at points separated n Nyquist intervals apart. In
this case the basic functions are
sin11 (Sgfc)
*1 =
n
(2nWtxn
1 n '
_ sinn (agt)
1 n '
s.nn (2^t}
n-2
/2nWt%
K~ n~";
rn 2nWt
n
These functions possess the properties:
1. They lie within the band W.
2. They vanish at t = |g K = ± 1, ± 2, ... ,
(that is at n-th Nyquist points) and also their
1st, 2nd, (n-1) derivatives.
3. At t = 0, all derivatives of cp_ vanish except the s-th
s
derivative which is 1.
Consequently we can reconstruct f(t) by using <pg to
adjust the s derivatives (s = 0, 1, n-1) and these adjust-
ments will not interfere.
The functions q; and their spectra are shown in Fig. 1
s
for the cases n = 1, 2, 3*
C. E. SHANNON
Att.
e 1
March 4, 194S
UVf-
The Normal Ergodic Ensembles of Functions
Among the possible probability distributions in a one-
dimensional space certain ones are of special importance because
of their simple mathematical properties and frequent occurrence
in the physical world. The most important of these is the
normal or Gaussian distribution with a density function:
1/J2R a exp £ | x2/<^
In an n-dimensional space the most important distribution func-
tion is an n-dimensional generalization of this, the n-
dimensional normal distribution:
i 5 r - -i
^IV<a»r e*P ai;j xi xj
Here a^ is the associated quadratic form and the
determinant of this form. This form is positive definite and
the surfaces of the constant probability are found by setting
the argument of the exponential function equal to a constant
2 H . x± Xj = C
and are therefore coaxial elipsoids in the space. The direc-
tions of the axes of this elipsoid are those of the eigen-
vectors of the form a^ and the lengths are inversely proportional
to the corresponding eigenvalues. By a rotation of axes the new
coordinate system can be lined up with these directions and the
distribution function reduced to
- 2 -
n
{X1» #oe» V (2n) exp - | Z 5^ y*
where the \± are the (positive) eigenvalues and the y^^ are the
new coordinates. The form a^j being positive definite has an
inverse A^j which is also positive definite with eigenvalues
The properties of the n-dimension normal distribution
which give it particular mathematical importance are the
following.
1. If x± and y± are two chance vector variables, which
are independent and distributed according to n-dimensional
normal distributions with quadratic forms a^ and b^. (inverses
A^j and B^) , then the chance vector variable = x± + Ji is
also distributed normally with the form c^y whose inverse is
Cij = fij + Bij°
2. If x is a normally distributed vector variable and
yj = 2 r^j x^ is a vector variable which is a linear operation
on (possibly of smaller dimension thann) then yj is normally
distributed with the inverse form
= Z r, r^ Ast •
ij s,t is jt
,3. Under certain quite broad conditions the resultant of
a large number of small chance vector variables, x® (s = 1, 2, N)
with arbitrary distribution functions, which are independent
gives a normal distribution for
3 -
with
providing no term of the sum contributes more than a small
fraction to any B.
4, If the a priori probabilities for each of two
independent vectors xi and y± are both normal, the a posteriori
probability of x^ when we know the sum x± + 7^ — ^ is
normally distributed (about a displaced mean, however).
5. The mean value of x± x^ for x± normal is given by
xi xj = Aij *
Among the many possible ergodic ensembles of functions
fa(t) there is also a certain class of particular mathematical
and physical importance. This class of ensembles can be con-
sidered a generalization of the n-dimensional normal distribution
to infinite dimensional function spaces ergodic under trans-
lations in time. We shall call these normal ergodic ensembles
of functions. They are completely specified by giving their
power spectra P(w) or their autocorrelation functions A(t)
which are the Fourier transforms of the power spectra. The
normal ergodic ensembles can be defined in various ways. They
occur physically when we pass a thermal noise through a filter,
shaping the power spectrum to P(w) = |l(w)|2, T(«) being the
admittance of the filter.
In the literature on noise these ensembles are often
treated in a loose somewhat illogical fashion by using either
of two "representations." The first representation is
oo
2 |P(nAf)Af cos (nAft + 6 ) .
n=0
The 6n are all uniformly and independently distributed over all
values from 0 to 2n. This representation amounts to making the
noise the sum of a large number of small sinusoidal waves with
random phases, and amplitudes adjusted to give the proper power
density in any small frequency range. The frequency increment
between adjacent waves Af is supposedly very small and in use
one evaluates any desired statistic of this set of functions and
determines the limit approached by this statistic as Af - 0.
This limit is taken to be the desired statistic of the normal
ergodic ensemble. The second representation is similar but uses
normally distributed amplitudes an whose variance cr is equal
to P(«)
2 aBAf cos (nAft + 6J .
Actually these "representations" will not give the
correct answer in all cases. For example, if we ask what
fraction of the functions in the representation ensemble r^
are periodic, we find that all are, so the probability is unity,
and the limit as Af 0 is also therefore unity, while almost
none of the functions in the ergodic normal ensemble are periodic
However it can be shown that if we restrict ourselves to what we
have called physical statistics, the answer will be identical;
the normal ergodic ensemble is the physical limit of either of
the above ensembles as Af -* 0,
A more logical definition of a normal ergodic ensemble
can be given as follows. We divide the frequency range up into
unit intervals and construct the sequence of "flat" ensembles
for these intervals. These will be given by
2 a„ sin nt •
n
These ensembles are passed through shaping filters to give the
proper power spectrum in the interval in question and the results
added.
The normal ergodic ensembles have properties analogous
to the n-dimensional normal distributions which we have given.
We have
Theorem: The sum of two functions fQ(t) + gp(t) where f and g
are from normal ergodic ensembles with spectra
and P2 is normal ergodic with spectrum P1 + P2.
Theorem: The output of any linear invariant transducer driven
by a normal ergodic ensemble is normal ergodic with
spectrum |Y(«)| P(w).
Theorem: Any finite dimensional linear operation on a normal
ergodic ensemble gives a normally distributed vector.
March 15, 194$
C. E. SHANNON
p
Systems Which Approach the Ideal as g — 00
We will show that it is possible to construct an
p
instantaneous system for sufficiently large - for transmitting
a sequence of binary digits such that the frequency of errors
is arbitrarily small and the power required only slightly
greater in db than the ideal for the corrected rate of trans-
mission. More precisely we have the
Theorem: Given any e>0 and 8 > 0 we can transmit binary digits
on an instantaneous basis with frequency of errors
< e and corrected rate of transmission
R > W log -jl + (1 - 5) | J
The system to be used is of PCM type with an extremely large
number of amplitude levels. Let there be 2s levels, and number
them with a binary notation, but in the Stibitz type code, so
that only one binary digit changes on going to an adjacent
level. If we are in error by d levels, at most d binary digits
of the s will be incorrect. If there are many levels in the a
distance U/I) of the noise the expected number of errors will
be approximately
2
•p
We take £ large enough so that es > a.
Thus the frequence of
errors in our final result will be < e. The levels should not
be spaced uniformly but according to the density of a normal
distribution. If this is done the received signal will be
nearly Gaussian with a — J? + N and the corrected rate of
transmission
H > W log 1 + (1 - 5) |
C. £• SHANNON
March 29, 194$
DO
Theorems on Statistical Socuencea
If It la poaalbla to go froa any state with P > 0
to any other alone a path of probability p > 0, tha system la
argodlo and tha atrong law of large nuabera can be applied.
Thus the number of tines a given path p^j in the network la
traversed in a long sequence of length K is about proportional
to the probability of being at i and then chosaing this path,
P.p. 4K. If N is larne enough the probability of percentage
error i 6 In thia la less than c so that for all but a aet of
email probability the actual numbers lie within the limits
Hence the probability that nearly all sequences lie within
limits ± ft is given by
and lfijLJfc lB limited by
• I(PlPiJ ± |)log PiJ
or
| ^ - * PiPij log Pijj < *
Thus we have I
Theorem For almost all sequences
2
Um ' to*-* • H • - i PiPij log Pjj
where p is the probability of the sequence baring the block
of length L starting at the first position.
Thus for all but a set of blocks of probability < «
and for B large enough
(H - $)«<- log p < (H ♦ n)H
*.p(H - q)H. < — p log p < P(H ♦ n)M
where «e hare aummed orer all but the set of small probability
i. p(H ♦ a.)I £ (I ♦ sJM * P S W * *>*
and * p(H - q)* (H - q)I * P U - q> ■ U - •>
For the sot of oaall probability
•I p log p
^ log ^
since this is maximised f or ip • t by making all p equal, and
the number of them 1 -Jj • But this is dominated by
• l P log p| £ |«W lo« |
1 •»
with « as snail as d« sired for sufficiently large K and small c.
Henee this does not affect the sua ia the limit as I -* oo and
we have the
Theorems
Lia £ I p (Bt) log p(BL) - H
I - oo
where plB^ is ths probability of block B^ of length L, and
the sua is ovsr all possible blocks.
We now prove the
Theorem H • - i. p(BijSj) log PB^8!*
« Lie -* q(BtSj) log qB (3^)
UBHoe
where p(Blt8j) is the probability of block Bi followed by 8^ and
PB^Sj) is the conditional probability of 8j after the block Bt
ia known to occur. q(Blt8j) in the probability when B^ ia
computed on the basis of any initial state probabilities, not
necessarily the proper ones and q^Sj) the corresponding condi-
tional probabilities.
The first equality is trus since we may summ first on
all B± leading to a given state K. *he terms q,B^CS ^) are then
all equal to Pjj and the terse qlB^j) sum to PKPjj gives the
desired result.
If the q»s are used, the q^lSj^ are still p^ where
I It the stat* In which B± ends.
* qU-.S.) • pkj i. P(B1)
since any Initial distribution tends toward equilibrium.
We hare shown that apart from a set of small probability,
the probabilities of blocks of length L lie within the limits
-(H - S)M .(H ♦ S)M
* < S> < 2
where S can be made small by taking B large enough. Let the
maximum number of blocks of length M when we delete a set of
measure • be Qg(«). Thent
I p - (1 - t)
remaining
set
Q (I) p - Q (M) 2*lH * *)M
t max c
log 0tl«) > (H ♦ 6)M ♦ log(l - t)
Hence
log 0 (li)
Lim S - %U) £ 8
I -CO II
Similarly
1 > I p > GC(K) pj^B
frota which we obtain
log 0
and
•U) * H
Hence we hare
Theoremi vU) - » 'or t J1 0, 1
Tha fact that for large M nearly all blocks hare a
probability limited by
ri°JLE ♦ s
< *
does not imply that those probabilities approach equality.
In fact they will generally diverge from one another but the
db range becomes small compared to K, eince for p's satisfying
6
this inequality
*»« Pmax lQg Pmln m log _
I II 1
It it possible to show, however, that thert exists among the
blocks of length It a subset, all of equal probability which
hare the sane growth with K as the set including all blocks
except those of small probability totaling less than t: namely ,
the subset will contain more than 2*H " ^N eleoents with 5
arbitrarily small.
Consider all blocks beginning in a given state, say
state 1, and ending in this state. Let these blocks B1
fig*... have lengths n^, n2,...., t^, .... and conditional
probabilities p^, p2, pat ..... when we start from state 1.
We first prove
-1
Theorem: I p^n^ • p^
The first part is true since the ergodic character of the system
makes the Inverse frequency of occurrence of state 1, equal
to the mean distance between its occurrences, I Pi*i« The
second part is true since almost all blocks of large length N
have approximated the proper frequency of each B^.
Now we return to the construction of a subset of growth
(H . 6)1
2 all of equal probability* Let us choose integers
ai at close as possible to
and construct sequences with of the block B± . The number
of block* is then
and the number of sequences:
» <- Pt log pt
The growth Is then in term* of symbols
lag* . , * 4* .
This proves the following!
Theorems Given I > 0 there exists a set of M blocks of length X
(when H is sufficiently large) such that
AS - ft)S
k> a
and each block has the same probability, and starts and ends in
the eeme state, which can be chosen arbitrarily*
In case the system is not ergodle but made up of a
finite number of ergodle systems:
r - X ctrt
each rt will hare a rate Hi which we may assume arrengee in a
now increasing sequence
The function %{•) then bieoMi a decreasing atep function in the
manner Indicated by the following I
Theorem! In the case conaidered
K-l
?(c) • in the internal la^ <i< j ^
For if c it in the range indicated we oust take a set
of poaitiTe probabilities froa at least one of r1# ...» rj.
This gives a growth of type
at least, and can be limited to this by choosing all sequences
The quantity
will be called the man statistical rata for the system.
C. E. SHAM UGH
April 26, 194*
Samples of Statistical English
C B S^a**o*
A number of samples of statistical English including
probability structure out to four, words are given below. These
were constructed by starting off with three words from a book.
These three words are shown to someone who fits them in a
reasonable English sentence and writes down the word following
the three. The first word is then covered up and the process
repeated with a different person, etc. If the imagined sentence
ends after the added word, the person writing the word adds a
period. For samples bearing a title the participants were told
that this was the subject dealt with. These samples may be
compared with those in "A Mathematical Theory of Communication"
where less statistical structure is included.
The samples given here were obtained for the most
part, with the aid of J. R. Pierce, B. McMillan, C. C. Cutler
and W. E. Mathews, A few of the samples were obtained from
other sources (contemporary literature, etc.) and are included
for comparison. The reader may try his skill at guessing which
are statistically constructed. The true sources are given at
the end.
1. This was the first. The second time it happened without
his approval. Nevertheless it cannot be done. It could
hardly have been the only living veteran of the foreign
power had stated that never more could happen. Conse-
quently people seldom try it.
2. John now disported a fine new hat. I paid plenty for the
food. When cooked asparagus has a delicious flavor sug-
gesting apples. If anyone wants my wife or any other
physicist would not believe my own eyes. I would believe
my own word.
3. That was a relief whenever you be let your mind go free
who knows if that pork chop I took with my cup of tea
after was quite good with the heat I couldn*t smell any-
thing off it ITm sure that queer looking man in the
4. In a few days was the minimum amount of money remaining to
the end. However everyone knows the meaning implied. It
was true when Cutler says that we should proceed care-
fully. When you love yourself too much., The woman who
accosted
5. Fourscore and twenty years passed before we could meet them
that isn't already done should have been a good son is
going fast according to the teacher of his ability. His
intelligence sufficed for the time. This cannot change
much.
- 2 -
6. Even the killing was atrociously perpretated by the
cruelest treatment that a small boy jumped over the hedge
and buried her. A grave fault of many approaches to the
furthermost reaches of the state. Politics and business
are becoming lost to the .
7. It is an Italian ox mouth dish. The only thing in the
room is worms. I am the director of the seminar. In an
evolving hemisphere. C'est Monsieur Jardin. I am a
patient. Oh my dear Plapsen, you are my dearest Klapsen.
He took it with many other matters are more apparent if
they think so. Is there a reason for supposing that
most people don't. Nevertheless sex is absolutely neces-
sary as though the electron diffraction camera plate up
on the top surface of
9. Fifteen years before the mast, he ever had eaten. Try
it and see, I believe that whatever arises a fund has
been accumulated sufficiently in the near future holds
m« ™™ * * ■ • • ■ ...
many surprises. No man can judge his actions by his wife
Susie .
10. I forget whether he went on and on. Finally he stipulated
that this must stop immediately after this. The last time
I saw him when she lived. It "happened one frosty look of
trees waving gracefully against the wall. You never can
11. When I bought my wife a long time ago. I knew that it
wasn't faster when he didn't eat or drink a toast to
John Doe, otherwise known as McMillan's theorem.
Whatever the nature of Christ's teachings. Go far into
12. McMillan's Theorem
McMillan's theorem states that whenever electrons diffuse
in vacua. Conversely impurities of a cathode. No sub-
stitution of variables in the equation relating these
quantities. Functions relating hypergeometric series
with confluent terms converging to limits uniformly
expanding rationally to represent any function.
13 • House Cleaning
First empty the furniture of the master bedroom and bath.
Toilets are to be washed after polishing doorknobs the
rest of the room. Washing windows semi-annually is to be
taken by small aids such as husbands are prone to omit
- 3 -
14. Epiminondas
Epiminondas was one who was powerful especially on land
and sea. He was the leader of great fleet maneuvers and
open sea battles against Pelopidas but had been struck on
the head during the second Punic war because of the wreck
of an armored frigate.
15. Salaries
Money isn't everything. However, we need considerably
more incentive to produce efficiently. On the other hand
too little and too late to suggest a raise v/ithout a reason
for remuneration obviously less than they need although
they really are extremely meager.
16. Murder Story
When I killed her I stabbed Claude between his powerful
jaws clamped cruelly together. Screaming loudly despite
fatal consequences in the struggle for life began ebbing
as he coughed hallowly spitting blood from his ears.
Burial seemed unnecessary since further division was
necessary.
The sources are: 3, from "Ulysses" by James Joyce,
page 748; 7 and 14 are the conversation and writings of two
schizophrenic patients (quoted from Bleuler, "A Textbook of
Psychiatry"). All others constructed by statistical means.
„_C, ..-£,. -SHANNON
"June 11, 1948
The Department of Defense
H DEVELOPMENT
Washington 25, D. C.
Prepared by
THE PANEL OF COMMUNICATIONS OF
THE COMMITTEE ON ELECTRONICS
Approved:
Chairman
5. SIGNIFICANCE AND APPLICATION
C. E. Shannon
Bell Telephone Laboratories
Murray Hill, N. T.
1. Introduction.
A general communication system is shown in Figure 3. An information source
produces a message. This is encoded in a transmitter to produce a signal suitable for
transmission over the channel. During transmission the signal may be perturbed by
noise. The perturbed signal is decoded or demodulated at the receiver to recover, as
well as possible, the original message.
The situation is roughly analogous to a transportation system for transporting physical
goods from one point to another. We can imagine, for example, a lumber mill producing
lumber at an average rate of R cubic feet per second and a conveyor system capable of
transporting C cubic feet per second. If R is greater than C the full output of the mill
cannot possibly be carried on the conveyor. On the other hand, if R is less than or equal
to C it may or may not be possible, depending on whether the lumber can be efficiently
packed in the available space of the conveyer. However, if we allow ourselves to saw
the lumber up into suitable sizes and shapes we can always approach 100 per cent effi-
ciency in packing. In this case we must, of course, supply a carpenter shop at the other
end of the conveyor to reassemble the lumber in its original form before passing it on
If the analogy is sound we might hope to define two parameters R and C associated
with an information source and a channel, respectively. R should measure, in some
sense, how much information is produced per second by the source, and C the capacity
of the channel when used in the most efficient manner for transmitting information. We
would expect then that if R ^ C the full output of the source cannot be transmitted satis-
factorily. If R ^ C it should be possible to transmit the output of the source by proper
encoding and decoding at transmitter and receiver. It turns out that it is possible to
define quantities R and C which measure these information rates and capacities and
satisfy the desired relationships. We will attempt to show how this can be done without,
however, giving mathematical proofs of the results.1
2. The Information Source.
The first problem is that of clarifying the nature of "information" and finding a
measure of the rate of production for an information source.
Information involves basically the concept of "choice." An information source
chooses one particular message from a set of possible messages. If there were only
!For mathematical details, see Shannon, C.E., "A Mathematical Theory of Commu-
nication," Bell System Technical Journal. July and October, 1948. See also Shannon, C .E . ,
"Communication in the Presence of Noise," Proceedings of the I.R.E. (Forthcoming).
to the consumer.
14
one possible message there would be no communication problem. The amount of informa-
tion produced by a source must evidently be related to the range of choice available.
The simplest possible choice is a choice from two equally likely possibilities, say
0 or 1. We shall call the corresponding unit of information a binary digit or "bit." A
relay or flip-flop circuit has two possible states and is capable of storing one bit of
information.
A device which chooses at random from 0 or 1 making one choice each second is
considered to be producing information at rate R of one bit per second. Such a source
produces a "message" which is a random sequence of O's and l's.
A choice from say. 32 equally likely possibilities can be considered as a series of five
choices, each from two equally likely possibilities, and, therefore, should correspond to
five bits. More generally, a choice from n equally likely possibilities represent logP
n bits. £
Suppose now that the various possible choices have different probabilities of occur-
rence, say pi, p2, pn. How much information is produced when a choice is made under
these circumstances? One feels intuitively that less "choice" is involved in a device
which chooses between 0 and 1 with probabilities .01 and .99 than in one which chooses
with equal probabilities. In the former case the result is almost sure to be 1.
The following example shows that by proper encoding an average compression can be
obtained by using the probabilities pi, P2, pn. Suppose there are four possible choices
A, B, C, D with probabilities pA = 1/2, pB = 1/4, pc = 1/8, pD = 1/8. If we use a simple
direct code into binary digits:
A = 00 B = 01 C = 10 D = 11,
we use two binary digits per letter. On the other hand, using the following code where
more probable letters are given short codes and less probable letters longer codes, we
obtain an average saving
A=0 B = 10 C = 110 D - 111.
This is a reversible code; the original text can be recovered from the encoded sequences
as is readily verified. With this code we need, on the average, only
(1/2 x 1 + 1/4 x 2 + 1/8 x 3 + 1/8 x 3) = 1 3/4
binary digits per letter. We may say then that a choice with probabilities 1/2, 1/4, 1/8,
1/8 corresponds to 1 3/4 bits of information. If an information source were producing
a sequence of the letters A, B, C, D with these probabilities we could encode it into a
sequence of binary digits in which 1 3/4 binary digits are used on the average for e?.ch
letter of message.
A general analysis of the situation shows that if the letters are chosen with probabili-
ties plf p2, pn then it is possible to encode into binary digits using
H = - 2, Pi log2 Pi
binary digits per letter of message on the average, and there is no method of reversible
encoding using less. This H then is the equivalent number of bits per letter, and, if the
source produces n letters per second, R = nH is the rate of production in bits per second.
16
In the case of English text the statistical structure is more involved. There are the
mricms letter probabilities Pi, but, also, there are statistical influences between nearby
totters For example, the letter T is more often followed by H than by any other letter
a Qis almost invariably followed by U, etc. In such cases there is a more general formula
i for calculating the equivalent number of bits per letter of message. Let pU, 3» ■ s)oe
i Ibe probability in the language of the sequence of letters i, j s. Then we define G„
ft
l:
.V;!i.
m
p(i, j, s) log2 p(i, i, .... s)
where the sum is over-all sequences of letters which are just n letters long J^h which
ouences Gi. Go Gn> ... represents a series of approximations to the desired H which
takes into account mofe and more of the statistical structure as we proceed along the
sequence. The information per letter of message can be defined by the limiting value of
the G's.
H = Lim G
— » oo
n
It can be shown that H has the desired properties; namely, we can encode the messages
from the source into binary digits using H binary digits per letter on the average, and no
method of encoding uses less.
For the English language H has been estimated at roughly 2 bits per letter, taking
account only of the statistical structure out to about 6 or 8 letters.
If the messages produced by the information source are continuous functions of time
ta in speech or television transmission, the situation is much more involved and we will
not discuss it in detail. It is still possible to assign a rate of production of information
In bits per second to such a source, but the rate now depends on other considerations.
With continuous functions as messages, exact reproduction is not generally required and
the rate R depends on the amount and nature of the discrepancy which can be tolerated
between the original and recovered messages. The tolerable discrepancy in turn is
determined by the final destination of the messages. With speech, for example, the toler-
able errors depend on the structure of the human ear and brain.
Although the mathematical problems involved in defining the rate for a continuous
source have been completely solved, it is in practical cases very difficult to estimate R.
The following calculation may be of some interest, however. Suppose we are interested
only in transmitting English speech (no music or other sounds), and the quality require-
ments on reproduction are only that it be intelligible as to meaning. Personal accents,
Inflections, etc., can be lost in the process of transmission. In such a case we could at
least in principle, transmit by the following scheme. A device is constructed at the trans-
mitter which prints the English text corresponding to the spoken words These can be ^
translated into binary digits in the ratio of about two binary digits per letter, or ^x4.D - v
per word. Taking 100 words per minute as a reasonable talking speed we obtain 900 bits
per minute or 15 bits per second as an estimate of the rate for English speech when in-
telligibility is the only fidelity requirement.
3. The Capacity of a Channel.
We now consider the problem of defining the capacity C of a channel for transmitting
Information. Since we have measured the rate of production for an information source in
17
mitted over a given channel?
in some cases the answer Is simple. With a . tele «»J%*£Z ^second,
can send 5n bits per second.
Suppose now that the channel is defined £ fc^j. JJ- ^ Vyclef pTrse^nfwide .
tions of time f(t) which lie within a cer^»^ a series of
It is known that a function of thi^type can be J£j say that such a function
equally spaced sampling points^ seconds apart Thus we may say
has 2W degrees of freedom, or dimensions, per second.
If there is no noise whatever »
Even when there is noise, if we place no ^tjon s ^JgPSSS!SSU
capacity will be infinite for we m **£W2?£tof e« p transmitter
number of different amplitude levels .^^nw^etevres The capacity depends, of
limitation.
The shiest type o, noise is white V^tt'S^K'''
distribution of ampUt^s is Ga**ta, and to a eetrnmr s ilat q 7 ^ tf
into a unit resistance.
The simplest limitation on transmitter power is ^^^S^£%M
SLr«TL£T£K SLrto/eTarametLs W, P, and N,
the capacity C can be calculated. It turns out to be
C = W log2 E-^Ji (bits per second).
P + N
N
different amplitudes at each sample point. In a time T there will be 2TW independent
samples. Thus, there are approximately
( / P + N) 2TW (p + N)TW
M " (V N ) = ( N )
different signal functions of duration T that can be distinguished from one another in spite
of the noise. This corresponds to
18
log2 M = TW log2 P ftN
binary digits in the time T or
C=W log2 P^N
binary digits per second. This formula has a much deeper and more precise signifi-
cance than the above argument would indicate. In fact it can be shown that it is possible,
by properly choosing our signal functions, to transmit W log2 fo^ binary digits per
second with as small a frequency of errors as desired. It is not possible to transmit
binary digits at any higher rate with an arbitrarily small frequency of errors. This
means that the capacity is a sharply defined quantity in spite of the noise. These state-
ments are proved by two different methods. *
The formula for C applies for all values of P/N. Even when P/N is very small, the
average noise power being much greater than the average transmitter power, it is pos-
sible to transmit binary digits at the rate W log2P N with as small a frequency of
errors as desired. In this case log2 (1 +£) is approximated by -£log2 e = 1.443 ^
and we have approximately
C = 1.443
It should be emphasized that it is only possible to transmit at a rate C over a channel
by properly encoding the information. In general, the rate C is only approached as a limit
by using more and more complex encoding and longer and longer delays at both trans-
mitter and receiver. In the white noise case the best encoding is such that the transmitted
signals themselves have the structure of a white noise with power P. The difficulty with
the approximate argument given for that case, and the reason it does not give a sharply
defined capacity, is that the selection of signals is not optional. The distribution of ampli-
tudes is not Gaussian as it should be.
4. Comparison of Ideal and Practical Systems. *
In Figure 4 the curve is the function
% = log (1 +f )
plotted against P/N measured in db. It represents, therefore, the channel capacity per
unit of band with white noise. The circle and points correspond to PCM and PPM systems
used to send a sequence of binary digits and adjusted to give about one error in 1CP binary
digits. In the PCM case the number adjacent to a point represents the number of ampli-
tude levels - 3 for example is a ternary PCM system. In all cases positive and negative
amplitudes are used. The PPM systems are quantized with a discrete set of possible
positions for the pulse, the spacing is ^j, and the number adjacent to a point is the num-
ber of possible positions for a pulse.
The series of points follows a curve of the same shape as the ideal but displaced
horizontally about 8 db. This means that with more involved encoding or modulation sys-
tems a gain of 8 db. in power could be achieved over the system indicated.
See Shannon, C. E., "Mathematical Theory of Communication" and "Communication
in the Presence of Noise."
20
Of course, as one attempts to approach the ideal, the transmitter and receiver re-
quired become more complicated and the delays increase. For these reasons there will
be some point where an economic balance is established between the various factors
It may well be, however, that even at the present time more complex systems would be
justified.
A curious fact illustrating the general misanthropic behaviour of Nature is that at
both extremes of P/N (when we are well outside the practic* ^/^pcMlotaS
in Figure 4 approach more cjosely the ideal curve. At very large P/N *e,f £M pomts
Approach to within 10 log10# = 4.5 db. of the ideal while with very small P/N the PPM
points approach to within 3 db. The relation
C = W log (1
can be regarded as an exchange relation between the parameters W and P/N. Keeping the
ch^el cgacity fixed we can'decrease the bandwidth W provided we ^ease P/N «£-
ficiently. Conversely, an increase in band allows a lower signal-to-noise ratio in the
channel The required P/N in db. is shown in Figure 5 as a function of the band W. It is
assumed here that as we increase W, N increases proportionally:
N = W N0
where N0 is the noise power per cycle of band. It will be noticed that if P/N is large a
reduction of band is very expensive in power. Halving the band roughly doubles the
signal-to-noise ratio in db. that is required.
The channel capacity C can be calculated in many other cases. A general result that
applies in any situation where the average transmitter power is limited to P is that the
channel capacity is bounded by:
WlogL^l^C £W log^
where N, is a parameter called the "entropy power" of the noise. It is defined as the
power ina white noise having the same entropy as the actual noise. N is, as before, the
average noise power.
21
22
REFERENCES
Nyquist, H.
"Certain Factors Affecting Telegraph Speed,'
Bell System Technical Journal, April 1924,
Hartley, R. V. L.
Shannon, C. E.
Toller, W. G.
Wiener, N.
Bailey, R. D., and
Singleton, H. E.
p. 324.
"Certain Topics in Telegraph Transmission
Theory," A.I.E.E. Transcripts, Vol.47,
April 1928, p. 617.
"Transmission of Information," Bell System
Technical Journal, July 1928, p. 535.
"A Mathematical Theory of Communication,"
Bell System Technical Journal, July,
October, 1948.
"Communication in the Presence of Noise,"
Proceedings of the I.R.E. (Forthcoming).
Sc.D. Thesis, Department of Electrical
Engineering, Massachusetts Institute of
Technology, 1948.
The Interpolation, Extrapolation and Smoothing
of Stationary Time Series, NDRC Report
(Forthcoming as a book to be published by
John Wiley and Sons, Inc., New York).
Cybernetics. John Wiley and Sons, Inc.,
New York, 1948.
"Reducing Transmission Bandwidth," Electronics.
August 1948, p. 107.
23
[Ml
Note on Certain Transcendental Numbers
Claude E. Shannon
This note calls attention to a certain class of
numbers that are easily shown to be transcendental but seem
to have escaped previous notice. A typical example is the
number
-2 *
X = 2 *
or more precisely X = ^Lim^Xn, ^n+l = 2 * ^0 = 2* ^ is ^
easily seen that X exists and satisfies the equation X = 2" .
It is known from a conjecture of Hilbert , proved by Gelfond
and by Schneider, that ax is transcendental if a / 0, 1 is
algebraic and x is an algebraic irrational. Nov; X is clearly
not rational, and if we suppose it an algebraic irrational,
it must then be transcendental, a contradiction. Hence it is
transcendental.
More generally let f be a function such that if
x is algebraic and does not belong to a set S, then f(x) is
transcendental. Let g1 and g2 be algebraic functions and
such that x f g1fg2x, xeS. Then the solutions of
are transcendental by a similar argument , using the fact that
g£ is algebraic. If the sequence Xn = (g1fg2)1X0 approaches
a limit X it must be transcendental. Some functions known to
have the property required for f are sin x, ex and JQ(x) , the
exceptional set S consisting of the number 0.
C. E . SHANNON
October 27, 1948
\ '. A CASE OF EFTIC1EHT CGDI83 FOl A BOIST CHAH38L
Consider a di aerate channel with two poeeiMe symbols
0 and 1* Hoise it aeeuaec to affect successive cyrbolB inde-
pendently **nd in such 6 wty that t o probability of a syjabol
bainf, inter, reted correctly at the receiver ie j> » * g 1 wnlealg
the probability of incorrect interpretation io q -
^ 2
ca^city of such & channel is
- e2
Ve e©»us» e very soall and epproximte log (1 ♦ c) by z
2
* e2 (natural units)
In bits .or ayebel, the capacity 1st
C - log*, a
A vary eiaple coda can be oonetruct<*J for this eyatea
to aond a Doquence of random binary dibits at nearly the rata C
with a quite snail frequency of errors | In other wards a code
Wuich la not far fron the ideal* The code is merely to repeat
each binary digit in the oeeeage a large number n of tiasee. At
the roceiver, a group of n is received, end the rajority report
la taken aa the original nessags eynbol.
If the m&mrp eynhol is 0 then a 0fs are trans-itted.
At tilt receiver the n received eynbols will be a -istur© of
0*8 und l»a the number of 0*s present will be distributed ac-
cording to a binonial distribution with p • I *, * and q ■
For large n the binonial distribution is approximately nornal
(and this approximation is especially ^ood when p 5 s close to
i). The exacted nc->*r of O'c is p n, and the standard devia-
tion is;
An error occu*e when the number of rocoivod O'o ie lose than
l.e* when the actual number of cores is p n - § av*iy froo
t;ie ejected nunber. In terras €>f r this iat
*■ - ^ — ^ standard deviations.
Hence the frequency of errors is given by the area of a noma!
curve with otandard deviation equal to unity fron a out to m.
To obtain a frequency of errors 10*3, say, we mist
have a ■ 1*5
n
t
and the rate is -JL. as coopered with the rate 1«.&5 the
2.3
ideal (with essentially zero froquency of errors).
Hovenber IS,
c. s. svjjman
December 6, 1943
Note on Reversing A Discrete Markhoff Process
In "A Mathematical Theory of Communication" a
language was represented by a discrete Markhoff process with
a finite number of possible states. Such a stochastic process
can be represented schematically by means of an oriented linear
graph as in Fig. 1
Consider the question of generating the same language
in reverse; for example, English but read backwards. Can we
always invert a finite state Markhoff process and obtain a
finite state Markhoff process? The answer is "yes" and further-
more the corresponding linear graph has the same topology, but
with reversed kwwl orientation on all branches. If the
original process has,! probabilities /(probability when in state
i of going to state j), then the reverse process has the same
state probabilities and the transition probabilities given by:
<yU) - g Hii)
t
This is true since this qj(i) is merely the a posteriori probability
for the original process that when in state j the preceding state
was state i. The inverse of Fig. 1 is shown in Fig. 2.
It is interesting to show directly that the entropy
H£ of the reverse process is equal to the entrop4jHp of the
forward process. Of course, this must be true a posteriori from
the general properties of entropy. V/e have
Pjfi'jU) - PifKj)
9 ?
- 2 -
Hence t
ZP^U) log Pjqj(i) - ZPifi(j) log Pl^i(j)
or
2Pjqj(i) log qj(r) ♦ 2Pjqj(i) log ?±
- ZtjfiU) log ♦ ZPij^itj) log Pi
Iff
Hence:
-HR + ZPj log Pj —Hp ♦ ZPi log Pi
C. E« SHANNON
1
Outline of Talk
American Statistical Society, December 28, 1949
INFORMATION THEORY
by
C. S. Shannon
Bell Telephone Laboratories, Inc., Murray Hill, R. J.
1, Information Produced by a Stochastic Process
In communication engineering , we are interested in
transmitting messages from one point to another. The messages
generally consist of a sequence of individual symbols, such as
the letters of printed English, which are governed by proba-
bilities. Thus, in English, there are the various letter fre-
quencies, digram frequencies, etc. The "meaning* of the
message (if any) is irrelevant to the engineering problem.
Abstractly, then, we may consider a message to be a sequence of
meaningless symbols produced by a suitable Stochastic process.
Communication systems must be designed to handle the ensemble
of possible messages; the particular one which will actually
occur is not known when the system is constructed. The source
producing messages is assumed to have only a finite number of
possible internal states.
2. Entropy as a Measure of -Information
A suitable measure of the amount of Information pro-
duced by a discrete Stochastic process is given by the entropy
H, where
Ha- Um hi p^, lo*2 **xl» ••"»
■ ™e> ^S» sw
- 2 -
in which x^, • Xjj is & sequence of N symbols produced by
the process, p(x^f •*#, x^) is the probability of this ssquence,
and the sum is over all sequences of this length.
The significance of the quantity H is that it is pos-
sible to translate messages from a source with entropy H into a
sequence of binary digits (0 or 1) using, on the average, H + c
binary digits per letter of the original message with any
positive c. It is not possible to translate so that fewer are
used* Thus. B measures, in a sense, the equivalent number of
binary digits per letter of message. It can be shown that H
also determines the amount ef channel capacity required for
transmission of the original messages.
entropy, Hx(y) , of one source relative to another. This
measures in a sense the uncertainty per letter of the y sequence
when the x sequence is known, or ths amount of additional infor-
mation in the y sequence over that available in the x sequence.
Hx(y) can be defined as follows:
Hjty) « H(x, y) - H(x)
where H(x, y) is the entropy of the sequence whose elements are
ths ordered pairs (x, y) •
3. The Nature of Information
While the entropy H measures the amount of information
produced by a Stochastic process, it does not define the infor-
mation itself. Thus two entirely difference sources might
produce information at the same rata (same H) but certainly they
are not producing the same information. If we translate the
output of a particular source into a different "language" by a
reversible operation, the translation may be said to have the
same information as the original. Thus we are led to consider
the information of a Stochastic process as that which is common
to all translations obtained from the given process by members
of the group 0 of reversible translations, or, alternatively, as
the equivalence class of all processes obtains* from the given
one by such translations. To avoid certain paradoxical situa-
tions, involving infinite internal storage in the transducer
doing the translating, it is desirable to first limit the group
Q to translations possible in transducers having a finite
number of possible internal states. The information associated
with a process may bs denoted by a single letter, say X. Thus
X = T means that T can be obtained by a translation of I, and
conversely. It is possible to set up a metric satisfying the
usual postulates as follows:
* 2H(x, y) - *(x) - H(y) .
Vith this metric It Is possible to define limiting sequences of
elements, each of which is an information. Thus s Cauchy
sequence, XjL> Xj, i« defined by requiring that
Lim ptX,, In) « 0 .
The Introduction of these sequences as new elements (analogous
to irrational numb ere) completes the space in a satisfactory
way and enables one to simplify the statement of various results.
k. The Information Lattice
A relation of inclusion, x > y, between two infor-
mation elements x and y can be defined by
x > 7 * Hx(y) ■ 0 .
This essentially requires that y can be obtained by a suitable
finite state operation (or limit of such operations) on x. If
x > y we call y an abstraction of x. If x > y, y > s, then
x > s. If x > y, then H(x) > H(y). Also x > y means x > y,
x f y. The information element, one of whose translations is
the process which always produces the same symbol, is the 0
element, and x > 0 for any x.
The sum of two Information elements, s m x + y, is the
process which produces the ordered pairs (x^, yn). We have
and there is no u < s with the properties; a is the least upper
bound of x and y.
The product s » xy is defined as the largest t such
that • > x, s > yj that is, there is no u > s haying both x
and y as abstractions. The product is unique.
With these definition* information element e fona a
metric lattice. The lattice it not distributive, nor even
modular. A non-distributive example 1b x, y independent
sequences of binary digits, with z the sequence obtained by-
mod 2 addition of corresponding symbols in x and y. Then
sy + 2x = 0 + 0 = 0
i(x + y) ■ i / 0 .
The lattices are relatively complimented. There
exists for x < y a ■ with
s + x = y
sx =* 0 .
The element s is not, in general, unique.
5. The Delay Free Group 0^
The definition of equality for information based on
the group 0 allows x = y when y is, for example, s delayed
version of x$ yB ■ x^. In some situations, when one must
act on information at a certain time, a delay is not permis-
sible. In such a case we may consider the more restricted
group of instantaneously reversible translations. One may
define inclusion, sum, product, etc., in an analogous way, and
this also leads to a lattice but of mush greater complexity
and with many different Invariants.
Proof of an Integration Formula
C. E. Shannon
The integral
0 sin2 x 2 sin^ or
has arisen in an acoustical problem. It has been evaluated for N = 1, 2, 3, 4 as
equal to
gN (a) = a N + 2 i— r-1 sin 2 i a (2)
(-1 '
by R. C. Jones, and he has conjectured that fN = gN for all a, Af. A general
proof follows.
From (1) we have
. , . , „, . 1 f ° cos lNx-2 cos 2(W - 1)* + cos 2W - 2) x .
A2*, -h ~ Tfn-1 + In -2 = ~ y J0 L^T^ ^
and
d a2 , , , cos 2Ate - 2 cos 2flV - l)a + cos2(A^ - 2)a
— AW»(«) y^ (3)
Also from (2)
Aiv = a + 2
(-1 '
2 _ sin 2(AT - 1) a
AN. AT ftV(a) N~^\
tit.N gsw = 2 cos 2(N - 1) a (4)
The equality of (3) and (4) can be established by noting that the numerator of (3),
-2-
Hence
cos 2 N a - 2 cos 2(N - l)a + cos 2(N - 2)a
Re [eJV,a - 2eJ2{N~l)a + e/W-2)aj
Re
^-i)a[c,2a_2 + c-,2a]J
= Re |«W-D« (2;-)2
2j
- - Re |4 sin2 a ^W-1)*) = - 4 sin2 a cos 2(N - l)a
but A2 (0) = A2 fN (0) = 0, so that
^2n,n8nM = Ai^/jvCot)
also it has been verified that
Si (°0 = /i(a)
£2 (°0 = /2(a)
Hence it follows in general that
A &leit*l ****** »t fr^Mlttltac lafonttttoa
2t Is p*«*lM* fey ¥fe*l*u# of eodulaUoe to Xmr
pjroto oao tutpmt of e oystos for *jr&»o*iUia£ Iafor»*Uoa at too
OXpoooo Of otters. Mi« T*risro« car.atmeo *tic* mj se exoasuigfg
i, uaitty of rocoivo* oigoel, ftiiica ess bo rou^iJ/
SMMMHtrwS la *««HM» t>/ S&0 tO £13 1 00
-
ratio*
£• TtttiiBZi 2 1%9? yc**r»p.
S. tlm of troossUooi£A»
ft. BoiOO 4*4 t&O OJKfeOtt*
aoooroX tteojr* of bow tfeooo voriofcioo oro roiotoO «*4 tSm
liivwi»«d oafi will oe &«volopo4 la a forthoofclas soaorwifim.
Bo»oo«r «poofcitt& x-.Ht*M/ *&4 oa&or « sus&ber of oojJUioay 0001*09- -
f ol2ooXm« e^ufitioos
a ■ f if y 10 {*)
3 * « aooouro Of 4ii*t0rtiGji at tftt **««tv*r
t * *f trooonlooiaa
* • bsaa iriiia ©f tro-ts&ittor
ST * aciso j-«w«T £*30|t?fl ti:«t 1» t&O O&iOO ?OW*r
p#r *Ait tw?.i4 oil Hi, *>*«&r*e» tolas
alalia *s flfci is toe rofii«» u^At-? *fi>.:mlaar*tioa
yjUUi ftmi tautt koojMtag rooolToft <|ooli*jr istojr&ottt
oo aor 0100010 t, F «M £ 1a r*rio*»o o*> loo* ft* oo
kooo tl*o gpam ©f t&« foooHoo*
r 1 21
«fcoro £«* an£ % or« too WUl triuioatttor tatar ao4 acl«o
QJQjSgf, **ria« too traaftftlsalast tiao. ^» fcr •sa«pl« t/jr to-
oroosiog btutf wUto oo ooo eoorofioo tra&o&ittor - tU«
m&a&m&t 10 la «a« ooaoo vor* foooroolo »iae* It lit « log-
aritt.ai« *moj o**lag aulto or boaA oJUitfc AlvMoo t&o o*or«r
»jf a ft* tor.
»ro two »*tbfld« of fetter Sag o1&ao1 *» aaloo rotlo «t too ox»«ooo
of boo* «i*to. BoltOor of titooo Jkwovo* Is by oor msw* eftUud
l& too ozobooso. Sfco $roooal aoKomotoa toooriooo o sow ootfaoo
at its t&Uft oosootlollr too aoxtwai e*oias of olgool
pmm* io oofelovoi for o $lm oo** wlata laero*oo* &U 4coo
not «oo£ toot «t« ftfotoa of troaoaiooieo lo • tooorotioaHf
Uool ono for tkoro oro oororol otHor aooo* of iss$*miM* ro-
ooivoi qooJLU* fcooola* f . *. ? *o& * flxoi - «**t tfclo oro too
to to yWlt m ooarlr tAool oireonago roto ootooo* too
anlM 1m Oaa^L fift Um of OOOlloo fcfa* YOl&OC
of too lopot ytoolotlag fomoUoa (too o$oooa faootloo la tolo-
saoao oaa roftle) ot o 00300000 of rofolorXr ooboo* oooyllat
Thus t«8 + 4~£**l ,
Oi *5 --« 4-4-2 + 1
A tnaaltttr for this ay* taa oould built 1m the
following way. A oondenaar ia okarged as usual to tha eamplad
roltage. fill roltaga la read on a comparator teiaaed up to
■
half the *w<""t If the comparator glrea a poaitlra Indlcatioa
am electronic switch la oloaad feeding a aegatire pulaa of 2*
uuita oT charga late tha condenser; If not a poaitlra pulaa of
2m unita is fad in. Tha oomparator is now switched to control
' -
at now pulaa source whieh preduaas pulaaa of 2n**1 units and tha
prooaaa is repeated. Thus tha circuit f aods in positire or
nogatlTO pulaaa of decreasing magnituda "hunting* for a balance.
At oaoh stags a rooordar remembers whathor a poaitlra or negatire
pulaa was used. Thass positire ant nagatira recordings actually
arc tha Binary roprasantation of tha original roltaga, as ona
can soo »y roading tha shore table with 1» roplaaod by 0. Baneo
tha raoolror of Jig, 4 can ho used without alteration in this
system*
- £723
Creative Thinking
f
Up to 100% of the amount of ideas produced, useful good
ideas produced by these signals, these are supposed to be arranged
in order of increasing ability. At producing ideas, we find a
curve something like this. Consider the number of curves produced
here - going up to enormous height here,
A very small percentage of the population produces the
greatest proportion of the important ideas. This is akin to an
idea presented by an English mathematician, Turig, that the human
brain is something like a piece of uranium. The human brain, if
it is below the critical lap and you shoot one neutron into it,
additional more would be produced by impact. It leads to an ex-
tremely explosive • of the issue, increase the size of
the uranium. Turig says this is something like ideas in the human
brain. There are some people if you shoot one idea into the brain,
* you will get a half an idea out. There are other people who are
beyond this point at which they produce two ideas for each idea
sent in. Those are the people beyond the knee of the curve. I
don't want to sound egotistical here, I don't think that I am
beyond the knee of this curve and I don't know anyone who is. I
do know some peopie that were. I think, for example, that anyone
will agree that Isaac Newton would be well on the top of this
curve. When you think that at the age of 25 he had produced enough
■
science, physics and mathematics to make 10 or 20 men famous - he
produced binomial theorem, differential and integral calculus, laws
of gravitation, laws of motion, decomposition of white light, and
so on. Now what is it that shoots one up to this
- 2 -
part of the curve? What are the basic requirements? I think we
could set down three things that are fairly necessary for scien-
tific research or for any sort of inventing or mathematics or
physics or anything along that line. I don't think a person can
get along without any one of these three.
The first one is obvious - training and experience,
lou don't expect a lawyer, however bright he may be, to give you
a new theory of physics these days or mathematics or engineering.
The second thing is a certain amount of intelligence or
you have
talent. In other words, /to have an IQ that is fairly high to do
good research work. I don't think that there is any good engineer
or scientist that can get along on an IQ of 100, which is the
average for human beings. In other words, he has to have an IQ
higher than that. Everyone in this room is considerably above
that. This, we might say, is a matter of environment; intelligence
ie a matter of heredity.
Those two I don't think are sufficient. I think there is
a third constituent here, a third component which is the one that
makes an Einstein or an Isaac Newton. For want of a better word,
we will call it motivation. In other words, you have to have some
kind of a drive, some kind of a desire to find out the answer, a
desire to find out what makes things tick. If you don't have that,
you may have all the training and intelligence in the world, you
don't have questions and you won't just find answers. This is a
hard thing to put your finger on. It is a matter of temperament
3 -
probably; that is, a matter of probably early training, early child-
hood experiences, whether you will motivate in the direction of scien-
tific research. I think that at a superficial level, it is blended
use of several things. This is not any attempt at a deep analysis at
all, but my feeling is that a good scientist has a great deal of what
we can call curiosity. I won't go any deeper into it than that. He
wants to know the answers. He's just curious how things tick and he
he
wants to know the answers to questions; and if/sees things, he wants
to raise questions and he wants to know the answers to those 0
Then there's the idea of dissatisfaction. By this I don't
mean a pessimistic dissatisfaction of the world - we don't like the
way things are - I mean a constructive dissatisfaction. The idea
could be expressed in the words, "This is OK, but I think things could
be done better. I think there is a neater way to do this. I think
things could be improved a little. w In other words, there is con-
tinually a slight irritation when things don't look quite right} and
I think that dissatisfaction in present days is a key driving force
in good scientists.
And another thing I'd put down here is the pleasure in see-
ing net results or methods of arriving at results needed, designs of
engineers, equipment, and so on. I get a big bang myself out of proving
a theorem. If I've been trying to prove a mathematical theorem for
a week or so and I finally find the solution, I get a big bang out of
it. And I get a big kick out of seeing a clever way of doing some
engineering problem, a clever design for a circuit which uses a very
small amount of equipment and gets apparently a great deal of result
out of it. I think so far as motivation is concerned, it is maybe a
little like Fats Waller said about swing music - either you got it or
ii
you ain't. If you ain't got it, you probably shouldn't be doing re-
search work if you don't want to know that kind of answer. Although
people without this kind of motivation might be very successful in
other fields, the research man should probably have an extremely
strong drive to want to find out the answers, so strong a drive that
he doesn't care whether it is 5 o'clock - he is willing to work all
night to find out the answers and all weekend if necessary. Well
now, this is all well and good, but supposing a person has these
three properties to a sufficient extent to be useful, are there any
tricks, any gimmicks that he can apply to thinking that will actually
aid in creative work, in getting the answers in research work, in gen-
eral, in finding answers to problems? I think there are, and I think
they can be catalogued to a certain extent. You can make quite a list
of them and I think they would be very useful if one did that, so I
am going to give a few of them which I have thought up or which peo-
ple have suggested to me. And I think if one consciously applied
these to various problems you had to solve, in many cases you'd find
solutions quicker than you would normally or in cases where you might
not find it at all. I think that good research workers apply these
things unconsciously; that is, they do these things automatically
and if they were brought forth into the conscious thinking that here's
a situation where I would try this method of approach that would
probably get there faster, although I can't document this state-
ment.
The first one that I might speak of is the idea of sim-
plification. Suppose that you are given a problem to solve, I don't
care what kind of a problem - a machine to design, or a physical
theory to develop, or a mathematical theorem to prove, or some-
thing of that kind - probably a very powerful approach to this
is to attempt to eliminate everything from the problem except the
essentials; that is, cut it down to size. Almost every problem
that you come across is befuddled with all kinds of extraneous
data of one sort or another; and if you can bring this problem
down into the main issues, you can see more clearly what you're
trying to do and perhaps find a solution. Now, in so doing, you
may have stripped away the problem that you're after. You may have
simplified it to a point that it doesn't even resemble the problem
that you started with; but very often if you can solve this simple
problem, you can add refinements to the solution of this until you
get back to the solution of the one you started with.
A very similar device is seeking similar known problems,
I think I could illustrate this schematically in this way. Tou
T s
have a problem here and there is a solution which you do not know
yet perhaps over here. If you have experience in the field repre-
sented, that you are working in, you may perhaps know of a somewhat
similar problem, call it P' , which has already been solved and
which has a solution, S'. All you need to do - all you may have
to do is to find the analogy from P' here to P and the same analogy
from S' to S in order to get back to the solution of the given prob-
lem. This is the reason why experience in a field is so important
that if you are experienced in a field, you will know thousands of
problems that have been solved. Tour mental matrix will be filled
with P's and S's unconnected here and you can find one which is
tolerably close to the P that you are trying to solve and go over
to the corresponding S' in order to go back to the S you're after.
It seems to be much easier to make two small jumps than the one big
jump in any kind of mental thinking.
Another approach for a given problem is to try to restate
it in just as many different forms as you can. Change the words.
Change the viewpoint. Look at it from every possible angle. After
you've done that, you can try to look at it from several angles at
the same time and perhaps you can get an insight into the real basic
issues of the problem, so that you can correlate the important fac-
tors and come out with the solution. It's difficult really to do
this, but it is important that you do. If you don't, it is very
easy to get into ruts of mental thinking. Tou start with a problem
here and you go around a circle here and if you could only get over
to this point, perhaps you would see your way clear; but you can't
break loose from certain mental blocks which are holding you in
certain ways of looking at a problem. That is the reason why very
frequently someone who is quite green to a problem will sometimes
come in and look at it and find the solution like that, while you
have been laboring for months over it. You've got set into some
ruts here of mental thinking and someone else comes in and sees it
from a fresh viewpoint.
Another mental gimmick for aid in research work, I think,
is the idea of generalization. This is very powerful in mathemati-
cal research. The typical mathematical theory developed in the fol-
lowing way to prove a very isolated, special result, particular theo-
rem - someone always will come along and start generalizing it. He
will leave it where it was in two dimensions before he will do it in
N dimensions! or if it was in some kind of algebra, he will work in
a general algebraic field; if it was in the field of real numbers, he
will change it to a general algebraic field or something of that sort.
This is actually quite easy to do if you only remember to do it. If
the minute you've found an answer to something, the next thing to do
is to ask yourself if you can generalize this any more - can I make
the same, make a broader statement which includes more - there, I
think, in terms of engineering, the same thing should be kept in mind.
As you see, if somebody comes along with a clever way of doing some-
thing, one should ask oneself "Can I apply the same principle in
more general ways? Can I use this same clever idea represented here
to solve a larger class of problems? Is there any place else that
I can use this particular thing?"
Next one I might mention is the idea of structural analysis
of a problem. Supposing you have your problem here and a solution
- 6 -
here. You may have too big a jump to take. What you can try to
do is to break down that jump into a large number of small jumps.
If this were a set of mathematical axioms and this were a theorem
or conclusion that you were trying to prove, it might be too much
for me to try to prove this thing in one fell swoopo But perhaps
I can visualize a number of subsidiary theorems or propositions
such that if I could prove those, in turn I would eventually arrive
at this solution. In other words, I set up some path through this
domain with a set of subsidiary solutions, 1, 2, 3» 4, and so on,
and attempt to prove this on the basis of that and then this on the
basis of these which I have proved until eventually I arrive at the
path S. Many proofs in mathematics have been actually found by
extremely roundabout processes. A man starts to prove this theorem
and he finds that he wanders all over the map. He starts off and
proves a good many results which don't seem to be leading anywhere
and then eventually ends up by the back door on the solution of the
given problem} and very often when that's done, when you've found
your solution, it may be very easy to simplify; that is, to see at
one stage that you may have short-cutted across here and you could
see that you might have short-cutted across there. The same thing
is true in design work. If you can design a way of doing something
which is obviously clumsy and cumbersome, uses too much equipment;
but after you've really got something you can get a grip on, some-
thing you can hang on to, you can start cutting out components and
seeing some parts were really superfluous. Tou really didn't need
them in the first place.
9 -
Now one other thing I would like to bring out which I
run across quite frequently in mathematical work is the idea of
inversion of the problem. You are trying to obtain the solution
S on the basis of the premises P and then you can»t do it. Well,
turn the problem over supposing that S were the given proposition,
the given axioms, or the given numbers in the problem and what you
are trying to obtain is P. Just imagine that that were the case.
i
Then you will find that it is relatively easy to solve the problem
in that direction. Tou find a fairly direct route. If so, it's
often possible to invert it in small batches. In other words, you've
got a path marked out here - there you got relays you sent this way.
You can see how to invert these things in small stages and perhaps
three or four only difficult steps in the proof.
Now I think the same thing can happen in design work.
Sometimes I have had the experience of designing computing machines
of various sorts in which I wanted to compute certain numbers out of
certain given quantities. This happened to be a machine that played
the game of nim and it turned out that it seemed to be quite diffi-
cult. It took quite a number of relays to do this particular calcu-
lation although it could be done. But then I got the idea that if
I inverted the problem, it would have been very easy to do - if the
given and required results had been interchanged; and that idea led
to a way of doing it which was far simpler than the first design.
The way of doing it was doing it by feedback; that is, you start with
the required result and run it back until - run it through its value
i
!
10
until it matches the given input. So the machine itself was worked
backward putting range S over the numbers until it had the number
that you actually had and, at that point, until it reached the num-
ber such that P shows you the correct way. Well, now the solution
for this philosophy which is probably very boring to most of you.
I*d like now to show you this machine which I brought along and go
into one or two of the problems which were connected with the design
of that because I think they illustrate some of these things I've been
talking about.
In order to see this, you1 11 have to come up around it; so,
I wonder whether you will all come up around the table now.
Bell Telephone Laboratories
incorporated
Cover Sheet for Technical Memorandum
subject The Relay Circuit Analyzer - Case 22103
COPIES TO:
CASE FILE
DATE FILE
AREA CENTRAL FILES (4)
i - Patent Dept. (2)
2- R0 Bown
3 - Wo Ho Doherty
4 - Ho Ho Abbott
5- A0 0. Adam
6 -Ao E, Anderson
7 -Eo Go Andrews
8 ~ Mo Mo Atalla
9 - Ho Wo Bode
10 - Co Breen
11 = Co Eo Brooks
12 - Eo Bruce
13 - Ao Burkett
14 = Ao Jo Busch
15 - Ro Lo Carmichael
16 - Ao Bo Clark
17 - Co Clos
18 - Ro Co Davis
19 - Jo Wo Dehn
20 - To Co Dimond
21 - Ko So Dunlap
22 - F. So Entz
23 - Jo Ho Felker
24 - Jo Go Ferguson
25 - Eo Bo Ferrell
26 - Go Eo Fessler
27 -Wo 0o Fleckenstein
28 - Jo Bo Fisk
29 - Go Ro .Frost
30 - To Co Fry
31 -Eo No Gilbert
32 - Go Wo Gilman
33 -Ko Goldschmidt
34 -Ro Eo Hersey
35 - Bo D„ Holbrook
36 -Ao Wo Hortons Jr6
37 - Lo Wo Hussey
33 -P. Husta
39 - Ao Eo Joel, Jr„
40 - Mo Karnaugh
MM~53~1400~9
mm- 53=1800=17
date March 31, 1953
author Co Eo Shannon
Eo Fo Moore
FILING SUBJECT
(TO BE ASSIGNED BY AUTHOR)
Switching Theory
41=Ao Co Keller
42=Wo Keister
43 - Go Vo King
44- Fo Ao Korn
45- Wo Jo Laggy
46=Co Yo Lee
47=Eo Co Lee
4S=Wo Do Lewis
49-Co Ao Lovell
50=Fo Ko Low
51- Ao Ao Lundstrom
52- Mo Eo Malonev
53- C. Ho McCandless
54- Bo McKim
55=Bo McMillan
56-Bo McWhan
57=G0 Ho Mealy
53=
Jo
t>G-Po
6l=Eo
62=0o
63~Oo
6
65- No
66- G.
6?=Wo
68-Ao
69=Ro
70= Co
71=Jo
72- R.
73- Ho
74- Co
75- H.
76»Fo
77-Fo
76- Bo
79-Lo
6*0=Ro
Sl=Eo
82=Fo
S3- Jo
64- So
S5-Eo
S6=Ao
87-W.
6S=X o
39-P.
Meszar
Go Miller
Mitchell
Fo Moore
Jo Murphy
Myers
Bo Myers
Do Newby
Ao Pullis
To Rea
Eo Ritchie
Wo Roberts
Rosenthal
P o Runyon
Mo Ryder
No Seckler
Eo Shannon
So Shapiro
F. Shipley
Jo Singer
Slepian
Jo Stacy
Eo Staehler
Eo Sumner
Wo Tatum
Go Tryon
H„ Washburn
Fo Watson
Weaver
Fnitney
Go Wilson
Lo Wright
(See next page for Abstract)
MM- 52 -1400-9
M- 53 -1300-17
March 31, 1953
AESTRACT
This memorandum describes a machine (made of
relays, selector switches, gas diodes, and germanium diodes)
for analyzing several properties of any combinational relay
circuit which uses four relays or fewer.
This machine, called the relay circuit analyzer,
contains an array of switches on which the specifications
that the circuit is expected to satisfy can be indicated, as
well as a plugboard on which the relay circuit to be analyzed
can be set up.
The analyzer can (l) verify whether the circuit
satisfies the specifications, (2) make certain kinds of
attempts to reduce the number of contacts used, and also
UJ perform rigorous mathematical proofs which give lower
bounds for the numbers and types of contacts required to
satisfy given specifications.
The Relay Circuit Analyzer - Case 22103
MM- 53 -11-00-9
M^-53-1300-17
March 31, 1953
MEMORANDUM FOR FILE
1. Introduction
Some operations which assist in the design of relay
circuits or other types of switching circuits can be described
in very simple form, and machines can be constructed which per-
form them more quickly and more accurately than a human being
can. It seems possible that machines of this type will be use-
ful to those whose work involves the design of such circuits.
This is the first of two memoranda describing particular mach-
ines of this kind which have been built.
The present machine, called the relay circuit
analyzer, is intended for use in connection with the design of
two terminal circuits made up of contacts on at most four relays
The principles upon which this machine are based are
not limited to two terminal networks or to four relays, although
an enlarged machine would require more time to operate. Each
addition of one relay to the circuits considered would approxi-
mately double the size of the machine and quadruple the length
of time required for its operation.
; This type of machine is not applicable to sequential
circuits, however, so it will be of use only in connection with
parts of the relay circuits which contain contacts, but no relay
C011S a
2. Operation of the Machine
The machine, as can be seen from Photograph 196492,
contains sixteen 3-position switches, which are used to specify
the requirements of the circuit. One switch corresponds to each
of the 2^*16 states in which the four relays can be put. Switch
No. 2 in the upper righthand corner, for instance, is labeled
W + X + Y» + Z, which corresponds to the state of the circuit
in which the relays labeled W, X, and Z are operated, and the
relay labeled Y is released.
The three positions of this switch correspond to the
requirements which can be imposed on the condition of the cir-
cuit when the relays are in the corresponding state. Since any-
single relay contact circuit assumes only one of two values
(open or closed), the inclusion of a third value (doesn't matter,
don't care, or vacuous, as it has been called by various per-
sons) merits some explanation. If the machine, of which the
relay circuit being designed is to be a part, only permits these
relays to take on a fraction of the 2n combinations of which n
relays are capable, then (except when considering what the mach-
ine will do in case of relay failures) any circuits which agree
on the combinations actually assumed will be equivalent in their
properties. Since the class of circuits which agree with what
is wanted just in the necessary combinations is larger than the
class of those which agree in all combinations, the former
class can and frequently will contain members using fewer con-
tacts. Hence the switch corresponding to each state is put
into the don't care position if the circuit will never assume
that state, or if for any other reason the behavior when in
that state is immaterial. The sixteen 3-position switches thus
permit the user not only to require the circuit under consid-
eration to have exactly some particular hindrance function, but
also allow the machine more freedom in the cases where the cir-
cuit need not be specified completely.
In order to make a machine of this type to deal
with n relays, (this particular machine was made for the case
n - 4) 2n such switches would be required, corresponding to
the 2n states n relays can assume. In each of these states
the circuit can be either open or closed, so there are 22*1
functionally distinct circuits. But since each switch has
3 positions, there are 32 distinct circuit requirements spec-
ifiable on the switches, which in the case n = 4 amounts to
43,046,721. Thus, the number of problems which the analyzer
must deal with is quite large, even in the case of only four
The left half of the front panel of the machine (See
Photograph No. 196492) is a plugboard on which the circuit be-
ing analyzed can be represented. There are three transfers
from each of the four relays, W, X, Y, and Z brought out to
jacks on this panel, and two plugs representing the terminals
of the network are at the top and bottom. Using these, as
well as some patch cords, it is possible to plug up any cir-
cuit using at most three transfers on each of the four relays.
This number of contacts is sufficient to give a circuit repre-
senting any switching function of four variables.
nn +ha „. If the specifications for the circuit have been put
on th« sixteen switches, and if the circuit has been put on
oplratef ^ ' ^ CirCUit anal^er is then ready to
care ^t^il^ t^6 co^tro1 switch and the evaluate -com-
pare switch both m the evaluate position, pressing the start
button will cause the analyzer to evaluate the circuit plugged
Ii^Ia k* ?° indlcate in which of the states the circuit is
closed by lighting up the corresponding indicator lamps.
nrtC1.. . Turning the evaluate-compare switch to compare
^tll°n^lhfuanalyzer then checks whether the circuit dis-
tfZttJUZ ? the requirements given on the switches. A dis-
?hl 1 indicated by lighting the lamp corresponding to
actual Mr^?UeStion' -If t Switch is set for cl0^ed a"d the
actual circuit is open m that state, or vice versa a dis-
agreement is indicated, but no disagreement is ever 'registered
S^SS? eJSdJ&E the ^
to the short test position and the start button is pressed again
clrcSS^d^TdeJenBiBS8 Whether any of contaclfin this '
sa?iafVin^2ohaVe ^6en shorted out, with the circuit still
bestdf7^! thVe5ulrements. The machine indicates on the lamps
beside the contacts which ones have this property.
ever need tht «!aSUrprising to the reader than anyone would
rlniVkl the assistance of a machine to find a contact which
is certlin?vrtrue°^ £th?Ut affe?ting «»■ circuit, Wni?e t£is
eulf! ™5r LSrS °f simPle examples, in more complicated cir-
ticSLSv \ f ediJ2dant elements are often far . from obvious, pa?-
in S« iLif th6re Sre Some states for which the switches are
in the don't care position, since the simplified circuit mav be
onff f8 °nly un tlie do" t care state. It is often quite diffi-
cult to see the simplification in these cases.
in„ fln3i P6 anaiy?!r is also helpful in case the circuit be-
tngi-^-yZ6d lS abridP> because of the complications involved
P?^2einf °Ut a11 paths ,in the bridge' The^circuit shown in
iJf???M.nSTan/Xampl! °f a,circuit which was not known to be
inefficiently designed until put on the analyzer. It determined
in less than two minutes (including the time^required to pW
not S1,0?1!1?*0 the P^osird) that one of the contacts shown
can be shorted out. How likely would a human being be to solve
this same problem in the same length of time?
if
. After the short test has been performed, putting
the^main control switch in the open test position permits the
analyzer to perform another analogous test, this time open-
ing the contacts one at a time.
These two particular types of circuit changes were
chosen because they are easy to carry out, and whenever suc-
cess! ul, either one reduces the number of contacts required,
inere are other types of circuit simplification which it might
be desirable to have a machine perform, including various
rearrangements of the circuit. These would have required
more time as well as more equipment to perform, but would
probably have caused the machine to be more frequently suc-
cessful in simplifying the circuit. Using such techniques,
it might be possible to build a machine which could design
circuits efficiently starting from basic principles, perhaps
by starting with a complete Boolean expansion for the desired
function and simplifying it step by step. Such a machine
would be rather slow (unless it were built to operate at
electronic speeds, and perhaps even in this case), and not
enough planning has been done to know whether such a machine
is practically feasible, but the fact that such a machine is
theoretically possible is certainly of interest, whether any-
one builds one or not.
Another question of theoretical interest is whether
a logical machine could be built which could design an im-
proved version of itself, or perhaps build some machine whose
over-all purpose was more complicated than its own. There
seems to be no logical contradiction involved in such a mach-
ine, although it will require great advances in the general
undertakenaUt°mata before any such ProJ*ect °ould ^ confidently
•
To return to the relay circuit analyzer, a final
operation which it performs is done with the main control
switch in the prove position. Pressing the start button and
moving the other 4-position switch successively through the
W, X. Y, and Z positions, then certain of the eight lamps
W, W[ , X, X', Y, I*-, Z, Z« will light up. The analyzer has
carried out a proof as to which kinds of contacts are required
to synthesize the function using the method of reduction to
functions of one variable, which will be explained in a forth-
coming memorandum. The analyzer here ignores whatever circuit
has been plugged in the plugboard, and considers only the func-
tion specified by the sixteen 3-position switches. If every
circuit which satisfies these specifications requires a back
contact on the W relay, the W» light will go on, etc.
- 5 -
If, for instance, seven of the eight lights are on,
any circuit for the function requires at least seven contacts,
and if there is in fact a circuit which uses just seven, the
machine has, in effect, given a complete proof that this cir-
cuit is minimal. Circuits for which the machine can give such
a complete proof are fairly common, although there are also
circuits (which can be shown to be minimal by more subtle me-
thods of proof) which this machine could not prove minimal.
An example is the circuit of Figure 1. This can be simpli-
fied by the analyzer to a circuit of nine contacts, but in
the prove position the analyzer merely indicates that at least
eight contacts are necessary. It can be shown by other meth-i
ods that the 9-contact circuit is minimal. But at any rate,
the analyzer always gives a mathematically rigorous lower
bound for the number of contacts.
3» The Circuit and Operation of the Relay Circuit Analyzer
A complete circuit diagram of the analyzer is shown
in Figures 2 and 3. The circuit, as already mentioned, has
five modes of operation; 1. evaluating a circuit, 2. com-
paring a circuit with desired characteristics, 3. examining
a circuit for contacts that can be shorted without affecting
operation, 4. examining for contacts that can be opened with-
out affecting operation, and 5. proving that certain con-
tacts are necessary in any realization of the function. The
method of operation of the circuit will be described in turn
for each of these five modes of behavior.
4. Evaluation of a Circuit
•
In this mode of operation the machine goes through
in sequence the sixteen possible states of the relays W, X, Y
and Z, that are involved in the circuit and tests in each state
whether or not the circuit is closed. If it is closed, the
corresponding panel light is lit. In this process only the
right-hand part of the circuit in Figure 2 is involved and
switches SIS and S19 are both in the evaluate position. The
selector switch S17 goes through one complete revolution to
make this test. During this revolution the four relays W, X,
Y, and Z proceed sequentially through their sixteen states.
This sequence is produced by the first two wipers and decks
of the selector switch S17. At the first position (0000)
all four relays are unoperated. At the second step (0001),
ground on the second wiper operates relay Z, which locks in
on its own front contact. The circuit is then set to test
the situation where W, X and Y are unoperated and Z is oper-
ated. At the third step relay Y is operated and locks in on
- 6 -
its own front contact. At the fourth step Z is short-circuited
by the wiper of the first deck. This releases Z and produces
the state 0010. Proceeding in this manner it will be seen that
the four relays W, X, Y and Z go through the sixteen states
indicated. The circuit which is being tested may be thought
of as being connected between plugs PI and P2 at the upper
left of the diagram. This network consists of contacts on
the four relays W, X, Y and Z. Actually some other contacts
are involved in the network between PI and P2 (contacts on
the H relays) but in the present mode of operation these H
relays do not operate and do not affect the hindrance from
PI to P2. For a given state of the relays W, X, Y and Z the
plugs PI and P2 will be connected together if, and only if,
the circuit being tested is closed for that state of the re-
lays. The relay G will, therefore, operate if, and only if,
the circuit is closed in the state in question. If it is
closed, a ground will be applied to the third wiper of the
selector switch S17 and this will fire the corresponding
neon lamp. If it is not closed +34 volts will be applied
to the lamp extinguishing it (if it is already fired). The
voltage across the lamp circuit, 64-24 or about 60 volts,
lies between the fire and sustain voltages for the neon
lamps. Consequently, if they are fired they will remain
fired, if extinguished they will remain out. Thus the lamps
remain in the state produced by the evaluation of the cir-
cuit even after the wiper has left the point in question.
The movement of the stepping switch is produced by
a three-stage buzzer circuit consisting of relays U, V and P.
In the buzzing condition the parallel S» and T» combination
in series with U will be closed. The operation of U ener-
gizes V through the front U contact in series with the V
coil. The operation of V then operates P in a similar manner.
The operation of P releases U through the P' contact. This
releases V which releases P. etc.
At the start of an evaluation, switch SIS will be
in the evaluate position, switch S19 in the evaluate position,
selector switch S17 at position 22 (and relay S, therefore,
operated) and selector switch S16 at position 21 (with relay T,
therefore, operated). When the starting push button S20 is
pressed magnet Ml of stepping switch 1 is energized. When
S20 is released Ml releases and the stepping switch moves to
position one. This releases relay S and the three-stage
buzzer U, V, P starts operating. At each cycle of this buz-
zer the coil of selector switch S17 is energized and released
by a make contact on the P relay. This sequences the relays
W, X, Y and Z through their sixteen states , as already des-
cribed, and indicates on the neon lamps the states for which
the circuit being tested is closed. When the wipers reach
level 22 relay S operates, stopping the buzzer and ending the
test.
- 7 -
5 . The Comparison Mode of Operation
In this mode of operation the circuit set up on the
plugboard is to be compared with the settings of the sixteen
three-position switches. If in any state the circuit disagrees
with the switch setting the corresponding neon lamp will light
up. For this test switch S18 is set in the evaluate position
and switch S19 in the compare position. When the starting push
button S20 is pressed, the buzzing circuit U, V, P starts as
before, cycling the selector switch S17 through one complete
revolution. The four relays, as before, go through their six-
teen possible states and the relay G, as before .operates or
not, depending on whether the circuit being tested is closed
or not. The lamps, however, are no longer controlled directly
by the relay G, but instead by contacts on the relay A. The
relay A is connected to operate, if, and only if, the circuit
condition of the network being tested (open or closed) dis-
agrees with the setting of the corresponding three-position
switch. This result is obtained by having one end of the coil
2f,,?elay A connected (via the fourth wiper of selector switch
S17J to +24 volts, nothing (i.e. floating) or minus, according
to the desired behavior of the circuit in the state in question
is open, "don't care", or closed (as represented by the setting
of the three-position switch). The other end of the relay A
is connected to +24 volts or minus, according as the actual
circuit under test is open or closed (this being carried out
by a transfer on the G relay). The relay A will operate only
if the two ends of the coil receive different polarities, and
this will occur only if the switch setting differs from the
state of the network under test as indicated by the state of
the relay G. If such a disagreement occurs the corresponding
lamp is fired by a ground coming in the third wiper of selec-
tor switch S17.
The starting and stopping are carried out by the
same means as used in the evaluate mode.
6. The Short Test
In testing for contacts in the circuit that can be
shorted, the sequencing is somewhat more involved. Roughly
speaking, the various contacts used in the circuit are short-
circuited one-by-one, and for each contact the circuit goes
through a sequence similar to the comparing mode of behavior
just described (comparing the circuit when this contact is
shorted with the desired characteristics set up on the three-
position switches). If any disagreement is found, the neon
lamp associated with the contact in .question is fired, indi-
cating that this contact is necessary in the circuit and cannot
- 8 -
be shorted. Actually, the sequence is a bit more complicated
since to save time and equipment the tests on the make and
break parts of a transfer in the circuit being tested are
interleaved.
To carry out the short test switch S16 is put in
the short position (the position of S19 is irrelevant). The
selector switches S16 and S17 start in positions 21 and 22
respectively, so that relays 3 and T are both operated. When
the starting button S20 is pressed, the magnets of both S16
and S17 are energized and when S20 is released they step
ahead one step releasing both S and T and allowing the buzzer
circuit to start. The first step of selector switch S16
causes E to operate. This removes the voltage from the in-
dicating lamps L16 to L39 (removing any indication on these
lamps from previous runs). Stepper 1 then proceeds through
a complete revolution. At step 17 the second wiper applies
a voltage to the coil of Sl6, pulsing S16 ahead one notch.
This releases E, and reapplies voltage to the indicating
lamps Lib to L39. The wipers of selector switch S16 are now
connected to position 1 (the top row) of this selector. The
sixth wiper operates relay HI which disconnects the first W
transfer from the circuit being tested. The three points in
the circuit being tested that were previously connected to
this transfer (on the W relay) are brought down to points
rl, P5 and P7, P5 coming through the third wiper. The free
ends of the W transfer, that are now disconnected from the
circuit being tested are brought down via wipers 2 and 4. To
test whether either part of this transfer can be shorted, the
selector switch S17 goes through a complete cycle, putting
the relays W, X, Y and Z in each possible state as in prev-
ious modes of operation. In each state, the first test is
to short P3 to P5, which in effect shorts the nodes of the
circuit normally connected to the W part of the contact, and
the circuit state is compared with the desired specification
on the three-position switch, A disagreement operates relay
A which, by way of wiper 1, fires the lamp corresponding to
the W contact. This shorting of the nodes occurs in the buz-
zer cycle during the period when the relay U is operated.
The A contact is connected to the corresponding lamp through
contact V and P' in series. This gives relay A time to oper-
ate (or release from a previous operation) before its reading
is applied to the lamp, and also disconnects the lamp before
the state of A is changed by the next operation.
The second test in the same buzzing cycle is to
short the break contact of the transfer. This occurs when U
releases, connecting P3 to P4 and P5 to P7. The W make is
then connected as usual in the circuit being tested (via the
Hx make, U» and wiper 2) and the nodes previously connected
to the back W» contact are shorted via the 3rd wiper of sel-
ector switch S16. In this part of the buzzing cycle the dis-
agreement relay is connected via P and V» contacts (for timing
margins similar to P» and V before) and the 5th wiper, to the
lamp corresponding to the W' or break contact. This lamp
will fire, as before, if a disagreement occurs indicating that
the contact is necessary.
After selector switch S17 has run through all states
{ rows 1 to 16) it applies ground through wiper 2 to the magnet
of selector switch S16, advancing it one step. The machine
now applies the shorting test to the X and X» contacts connected
to the second row of selector switch S16. Proceeding in this
manner it tests all the contacts. On reaching row 13, the 6th
wiper of selector S16 applies ground to its own coil through
its own back contact. This causes it to step rapidly through
the remaining positions until it reaches row 21 where it oper-
ates relay T. The first selector switch is meanwhile still
Deing pulsed by the buzzer circuit. After T operates, the
first time S17 reaches row 22, relay S operates and the buz-
zer stops. This completes the test.
i <- ?f is desired to hurry the machine through the
latter part of a test (for example if only a few of the avail-
able contacts are being used and these are near the top) the
reset button S21 can be pressed. This causes S16 to run
rapidly to the stop position (row 21).
7. The Open Test
The test for opening contacts proceeds exactly as
the short test just described, except that having switch SIS
in the open position opens wiper 3 of S16. This opens the
short that was applied in the previous test to the nodes
normally connected to the contact being tested. The relay
therefore indicates the behavior of the circuits when the
different contacts are opened.
The "Prove" Mode of Operation
When switch SIS is set in the "prove" position
the machine indicates, by lighting some of the lamps L40 to
that certain contacts are necessary in any circuit which
realizes the switching function set up on the sixteen three-
position switches. This indication is obtained by moving
switch S22 through its four possible positions. In the W
position the machine tests whether W and/or W contacts are
necessary and if so, lights the corresponding lamps etc.
- 10 -
The method of operation is based on the following
result in switching theory (stated for simplicity for the case
of four variables). At least one W (make) contact is necess-
ary in any realization of a given switching function if there
are one or more states of the other relays (X, Y, and Z) such
that when the X, Y and Z relays are in such a state, changing
the W relay from unoperated to operated changes the function
from open to closed. At least one W (break) contact is nec-
essary if there exists a state of the X, Y and Z relays such
that when they are in this state, operating the W relay changes
the circuit from closed to open. These are both obvious, since
the only way by which operating the W relay alone could close a
previously open circuit is by establishing an operating path
through a make contact on the W relay, and similarly for the
condition with a break contact.
The condition that a W contact is necessary can
also be thought of geometrically in the following way. The
sixteen states of the four relays can be thought of as the
vertices of a four-dimensional cube. This cube consists of
two three-dimensional subcubes, the first being the eight
states of the X, Y, Z relays with W not operated, and the
second, the eight states of the X, Y, Z relays with W opera-
ted. If there is any point in the "W unoperated" cube in
which the circuit is open (closed) while being closed (open)
in the corresponding point of the "W operated" cube, at least
one W (W ) contact is necessary.
The "Prove" part of the circuit can best be under-
stood in terms of this geometrical picture. A two-terminal
network with terminals a and b is set up in the machine,
corresponding to this cubeo Every vertex of the cube for which
the circuit should be closed is connected to terminal a; all
vertices for which the circuit should be open are connected
to terminal b ("don't care" vertices are left floating). When
testing for the necessity of W or W contacts, eight diodes
are connected between corresponding points of the three-
dimensional subcubes mentioned above. These point from the
"W unoperated" subcube to the "W operated" subcube. Current
will pass from terminal a to terminal b if and only if a W
contact is necessary. This is true since this conduction
can take place only by entering the cube at a closed state
(these being the only ones connected to terminal a), passing
through a diode in the conducting direction (this requires
that the closed state be in the "W unoperated" cube) and leav-
ing the cube to terminal b at an open state. Thus the con-
ditions for conduction from a to b are identical with the con-
ditions for necessity of a W contact. In a similar manner, it
may be seen that the network will conduct from b to a if and
only if a W contact is necessary.
- 11 -
In operation, the circuit is alternately tested for
conduction in the two directions. The alternation is obtained
by operation of the four-stage buzzer previously described.
When P is operated, the circuit is tested for conduction from
A to B. If this condition occurs, it fires the corresponding
neon lamp (for the w, X, Y or Z make contact). When P is re-
leased, voltage is applied to the AB network in the reverse
direction and if conduction occurs, it fires the correspond-
ing neon lamp (for the WV, X', Y» or Z» break contact). These
lamps remain fired until released either by turning off the
mam power or flipping the "evaluate-compare" switch S19 from
one position to the other.
Although it has been explained that the circuit for
doing these tests is laid out in the shape of a four-dimensional
cube, the circuit diagram of Figure 3 is not drawn by the use
of a direct projection of such a cube, but is laid out in a
Plane by a method due to W. Keister (The Design of Switching
Circuits, D. Van Nostrand, 1951, p. 174), which simplifies its
appearance.
It can easilv be verified that by putting switch
bd2 in any one of its four positions the circuit in Figure 3
reduces to a 4-dimensional cube with 8 diodes joining its two
halves. However the manner in which these 4 sets of & diodes
each were combined to give a total of only 14, while at the
same time using only 8 decks of the switch S22, may be of in-
terest. It can be applied to give similar economies in the
design of analogous circuits for cubes of any dimension. This
method depends on some concepts due to R. W. Hamming (Bell
System Technical Journal, 2£, pp. 147-160, April, 1950). It
is possible to divide the vertices of an n-cube into two mu-
tually exclusive and collectively exhaustive classes, called
parity classes, depending on whether the number of coordinates
having the value 1 is even or odd. If a point belongs to one
parity class, all of the points which have distance 1 from it
(and hence differ in only one coordinate from it) are in the
opposite parity class. .This means that every edge of the cube
connects vertices of opposite parity classes. Since in every
position of S22 the diodes are connected along edges of the
cube, it means that it is necessary to be able to connect
diodes only between points of opposite parity classes.
Thus the diodes are all connected to the points of
one parity class, and the decks of switch S22 are connected to
the points of the other class. If one diode pointing toward
and one pointing away from each point of the even parity class
is provided, then the switch contacts can connect each point of
the other parity class to the other end of the proper one of
these two diodes. In the actual circuit not quite this many
diodes are used, since the points 0000 and 1111 require only
one of the two diodes.
- 12 -
9. Notes and Comments
The small size and portability of this machine depend
on the fact that a mixture of relay and electronic circuit ele-
ments were used. The gas diodes are particularly suited for use
where a small memory element having an associated visual display
is required, and the relays and selector switches are particu-
larly suited for use where the ability to sequence and inter-
connect using only a small weight and space is required. In
all, the relay circuit analyzer uses only 24 relays, 2 selector
switches, 48 miniature gas diodes, and 14 germanium diodes as
its logical elements.
It may be of interest to those familiar with gen-
eral purpose digital computers to compare this method of solu-
tion of this problem on such a small, special-purpose machine
with the more conventional method of coding it for solution on
a high-speed general-purpose computer. One basic way in which
the two methods differ is in the directness with which the cir-
cuits being analyzed are represented. On a general-purpose
computer it would be necessary to have a symbolic description
of the circuit, probably in the form of a numerical code des-
cribing the interconnections of the circuit diagram, and repre-
senting the types of contacts that occur in the various parts
of the circuit by means of a list of numbers in successive
memory locations of the computer. On the other hand, the relay
circuit analyzer represents the circuit in a more direct and
natural manner, by actually having a copy of it plugged up on
the front panel.
This difference in the directness of representation
has two effects. First, it would be somewhat harder to use
the general-purpose computer, because the steps of translating
the circuit diagram into the coded description and of typing
it onto the input medium of the computer would be more compli-
cated and lengthy than the step of plugging up a circuit dir-
ectly. The second effect is in the relative number of logical
operations (and hence, indirectly, the time) required by the
two kinds of machines. To carry out the fundamental step in
this procedure of determining whether the given circuit (or
some modification of it obtained by opening or shorting a
contact) is open or closed for some particular state of the
relays requires only a single relay operate time for the re-
lay circuit analyzer. However, the carrying out of this fun-
damental step on a general-purpose digital computer would re-
quire going through several kinds of subroutines many times.
There would be several ways of coding the problem, but in a
typical one of them the computer would first go through a
subroutine to determine whether a given contact were open or
closed, repeating this once for each contact in the circuit,
- 13 -
and then would go through another subroutine once for each
node of the network. Altogether this would probably involve
the execution of several hundred orders on the computer, al-
though by sufficiently ingenious coding this might be cut down
to perhaps 100. Since each order of a computer takes perhaps
100 times the duration of a single logical operation (i.e., a
pulse time, if the computer is clock-driven), it turns out that
what takes 1 operation time on one machine takes perhaps 10.000
on another.
Since 10,000 is approximately the ratio between
the speed of a relay and of a vacuum tube in performing logical
operations, this gain of about 10,000 from the directness of
the representation permits this relay machine to be as fast as
a general-purpose electronic computer.
This great disparity between the speeds of a general-
purpose and of a special-purpose computer is not typical of
all kinds of problems, since a typical problem in numerical
analysis might only permit of a speed-up by a factor of 10
on a special-purpose machine (since multiplications and div-
isions required in the problem use up perhaps a tenth of the
time of the problem) . However, it seems to be typical of
combinatorial problems that a tremendous gain in speed is
possible by the use of special rather than general-purpose
digital computers. This means that the general -purpose mach-
ines are not really general in purpose, but are specialized
in such a direction as to favor problems in analysis. It is
certainly true that the so-called general purpose machines
are logically capable of solving such combinatorial problems,
but their efficiency in such use is definitely very low. The
problems involved in the design of a general -purpose machine
suitable for a wide variety of combinatorial problems seem to
be quite difficult, although certainly of great theoretical
intere st •
10. Conclusion
An interesting feature of the relay circuit analy-
zer is its ability to deal directly with logical circuits in
terms of 3-valued logic. There would be considerable interest
in techniques permitting easy manipulation on paper with such
a logic, because of its direct application to the design of
economical switching circuits. Even though such techniques
have not yet been developed, machines such as this can be of
value in connection with 3-valued problems.
- 14 -
Whether or not this particular kind of machine
ever proves to be useful in the design of practical relay
circuits, the possibility of making machines which can assist
in logical design procedures promises to be of value to
everyone associated with the design of switching circuits.
Just as the slide rule and present-day types of digital com-
puters can help perform part of the routine work associated
with the design of linear electrical networks, machines such
as this may someday lighten much of the routine work assoc-
iated with the design of logical circuits.
Attached :
Photograph No. 196492
Figures 1, 2 and 3
C. E. SHANNON
E. F. MOORE
FIGURE I
THE RELAY CIRCUIT ANALYZER WAS ABLE TO SIMPLIFY
THIS CIRCUIT, REMOVING ONE CONTACT, IN LESS THAN
TWO MINUTES TOTAL TIME. CAN YOU DO AS WELL?
E — — W\ 1 a I + ?*v
--£■'
r
SELECTOR
SWITCH Sit.
POSITION
NUMBER
I
3
<
5
17
ia
4^H>}— O— >M/<--"
— i>tW-.i
pu
pj
p
pll
pj
pD
pU
-X^-
pn
pu
pfl
p
pj]
pH
pj
pU
4^4-
pu
pu
-r2^
pD
pill
pll
pj]
PROVED " / ZCPE-t
P y'
-X— H
. — i — w — J— § ]— x — 1
r
$ei£tTor>
\SWITCH 317 STATE
\POSlTtON MOICATEO
NUMBER
I
oooo
^W^D
/?
13
14
IS
16
17
ia
is
20
Oil I
OIOI
OI00
1100
II 01
lilt
ton
tool
1000
L_
xse/
0pmvr
3,8. 0»»<-
Mhort
open
o—
59 0~
-o — K3
relay coil
front contact
bach contact
selector switch
Legend
_J
5
w 1 — * 4 — <T
OCOMPAHE a ^ t2*V
+39 V
S20
-x-
fXOVE
EVALUATE
3HO&T
OPEN
mill/
■■ HIU) WITHIN r*ICTH>N«L
6S
FIG. 2
/l*»//V CIRCUIT
DIAGRAM OF
RELAY CIRCUIT
ANALYZER
Bvu- TmrHOMf
LAVOMATOfttV*. I»4C
B-349291
P- *; <i'c
<M°CC
N3cJO O
J.&OHS (
3-LbTllMS 0
JAOt/c/
CO
in
•-qo-
— AV-
* 5
nO00'Zl\ ^
DIMENSIONS UP TO AND INCLUDING 72 INCHES tXPftCHID IN INCHES.
NON-LIMITED DIMENSIONS. OTHER THAN SIZE Of HAW MATERIAL. SHALL
»« HELP WITHIN FRACTIONAL DECIMAL
3S
issue/ j • ic -S3
mi
FIG. 3
pRovr circuit or
R E7 LAV CIRCUIT
ANALYZ EH
WESTERN ELECTRIC CO. INC
moiNin of M«»ur«ciu»<
CENTRAL OMCl IOUIPMIHT
BELL TCLEPHONE
Laboratories. Inc
B- 349292
p_ /HOLaxL
PIIINTID IN U ■. A.
m
S
Z
o
73
>
o
m
5"
2
o
O
TO
o
I
TrElOaAfl - GIECUIT OPKSATlOTi
The central part of the Throbac circuit is a relay accum-
ulator which can count up to eighty in a modified Roman numeral
system* The accumulator is arranged so that it io possible to add
or subtract I, V, X or L to the contents of the accumulator. It
consists of seven stages of U-2 circuits. The first three stages
Wl-Zl, £2-22 and i'<4-Z4 accumulate ,fI*sn. These stages are arranged
to count up to four arid recycle to aero at the fifth I. Thus,
within these stages either sero, one, two, three or four "1*8" will
be registered. The number of nI*sH appears in binary' form in the
three stages of »-Z.
The next \h-Z coribination accumulates "V's", either
aero or one V being registered here* The final three stages
VX^-Zi^, WIg-Zig and U*^-ZX^ accumulate ITs:,sn from aero up to seven.
If the relay F is operated, the accumulator is arranged
to add; if F is released, to subtract. Supposing F operated,
closing Pj adds I to the contents of the accumulator. Closing
Pv adds V, P1 adds X and PL add* U This may be verified by trac-
ing out the circuit paths into the w-2 circuits in the various
cases. For example, if the accumulator has aero in it, all W»s and
2*8 are released, and when Pj is closed a ground passes through a
chain of contacts Pj-F-Z^-F to pulse the WX-Z1 pair, and this Is
the only W-Z pair to receive a ground. If, instead, PL had been
pulsed, the fcfcj-ZJ^ pair and the SS^-ZX^ pair would both receive
ground, thus registering L (Sill ♦ X), A study of the circuit
• 2 *»
vd.ll show that In all cases it adds or subtracts (according to
the state of F) I, V, X or L when Pj, P^g Px or P^ is operated.
At the bottom of this circuit a connection leads out to
control the C relay* This connection will be seen to carry &
{.-round when a number is added to the accumulator vrhich causes it
to overrun its limit either by addition, giving a number greater
than seventy-nine, or, by subtraction, a number less than zero.
In these cases the carrying to or borrowing from f&utt would be
the next column goes out on the lead in question to control the
0 relay. This relay, to be described later, indicates the end
of a division.
The number registered in the accumulator is displayed
on the panel by means of a series of thirteen lights. These
lights are controlled by contact networks on the W-Z relays of
the accumulator. The contact networks translate from the modified
Roman numeral notation to the standard one. The part of the number
which is a multiple of ten appears in the three left columns of
lights* 1*7 or X7, or X6, L$ or X$. The part of the number
registered which is less than ten appears in the four right
columns of lights.
As an example, suppose the number registered is LXXV
(64) • In the accumulator the W-Z pairs W4-Z4 (HID, UX^-ZX^ and
WX2-ZX2 (XXXXXX) will be operated and other W-Z pairs released.
In the accumulator light circuit it will be found that lights
L©, I4 and will receive a ground and be illuminated, dis-
playing the number IXIV,
The sequencing for adding or subtracting a number entered
in the keyboard into the accumulator is carried out chiefly by
stepping switch A, For such an addition or subtraction, this
stepper sweeps across the keyboard, starting from the right-hand
column and sequentially adding or subtracting the numbers registered
ftn each column. The addition sequence is started by pressing the
ADD button which causes P to operate and lock in through a back
contact on £• The operation of P causes the bus a or relay 8 to start
operating and releasing at about ten cycles per second. Whan 3
closes it pulses the stepping coil of stepper A, novin^ it ahead one
notch. The release of D puts a ground on the wipers of the stepper
and, therefore, on the first vertical connection through the key-
board switches* Let us suppose that the number -IX VI is entered in
the keyboard In the four right-hand columns* I is then registered
in the right most colum and the ground from the stepper passes
through this I push button to operate the Pj relay* The F relay
has been operated by P and therefore X is added to the previous
contents of the accumulator* On the next cycle of the busser. the
stepper moves to the next column and operates the Py relay which
adds V into the accumulator* Py also causes E to operate and
lock in through t% The purpose of this is to cause any further I*s
to be subtracted rather than added. On the next cycle of the
buzser, ground is applied to the third vertical of the keyboard
and, because of the t entered there, operates the PL relay* This
adds L to the accumulator and also operates the S relay, which also
locks in through The operation of 5 signif ies that an L has
occurred and consequently any X'b or V«s now encountered on the
keyboard oust be subtracted. On the next cycle of the buzzer, the
fourth vertical receives ground and because of the X in this column,
pi operates. Since S is closed, the relay H also operates, releas-
ing F and isaking the accumulator subtract instead of add. The
tiding of these relays is adjusted so that F releases before the
p£ pulse could add into the accumulator. X is therefore subtracted.
On the next three cycles of the buzzer, no further numbers are en-
countered and the accumulator does not change. On the eii^ta
cycle, the wipers pass a ground to the K relay which locks in
axsmentarily, and also to the reset coil of the stepper. The opera-
tion of K releases relays P, & and s and also disconnects the buzzer
and the wipers. The reset coil allows the wipers to return to their
normal position and since they have been disconnected by K they have
no effect as they pass over the keyboard colunns. When the wipers
reach their nornal position they open the off-normal switch of the
stepper. This releases K and the addition operation is complete.
The process of subtraction is essentially the sane.
Pressing the subtract button causes M to operate and lock up, which
starts the buzzer and the stepping operations. In this case,
however, F Is normally released, so that numbers encountered in
the keyboard are normally subtracted* However, when a smaller
number is encountered after a larger one the relay F will operate,
causing It to be added.
Sfciltiplication Is obtained by successive addition. If
the m button is pressed, the machine adds the contents of the key*
board into the accumulator V tines, if the M button is pressed
X tines. This counting is controlled by stepper B. If the m
button is pressed, the keyboard contents ere added or subtracted
depending on whether the Wt or buttons have been previously
operated*
Suppose VIII is to be multiplied by IV. VIII is entered
in the keyboard and first the MV and then the 11. push buttons
pressed. When the m button is pressed, relay ffl operates and
locks in through Qt. The relay T also operates, locking in through
the Clear Upper key. The relay T signifies that I's occurring later
in the multiplier must be interpreted as negative. The operation
of KV causes the P relay to operate and start an addition operation*
When stepper A reaches the eighth point, K operates causing the step*
ping coil of stepper B to receive a ground {through the MV make) .
fcfoen stepper A resets to normal, P again operates, again adding the
keyboard contents into the accumulator and advancing stepper B at
the end of the addition* This process continues until stepper B
reaches Its fifth point* There the ground on the wipers operates
relay Q which releases MV and stops the series of additions.
Q locks in and applies ground to the reset coil of stepper B, return,
ing it to normal* When it reaches normal, the off -normal contacts
are opened and Q is released.
Next the ia button is pressed* Since T is in (due to
the previous operation of 117) , this causes H to operate and the
machine subtracts the keyboard contents from the accumulator. This
c ample tee the multiplication. The ML button produces a sequence
similar to the MV button , except that stepper £ crust go to the tenth
point instead of the fifth to operate Q and stop the series of
additions.
If another multiplication is to be performed, the Clear
Upper button should be pressed. This releases T and resets stepper
B to normal if for some reason it is not already there.
Division is performed by successive subtraction. The
dividend is entered in the accumulator and the divisor in the key-
board. When the divide button is pressed, relay E operates and
locks in through P* or K*. C is normally out and E, therefore,
causes M to operate and lock in, starting a subtraction. If, during
this subtraction, the accural la tor does not run through aero, C will
not operate and another subtraction will occur since U will again
operate as soon as £ releases. At each subtraction of this sort
the operation of & at the end of the subtraction energises the
stepping coil of stepper B advancing it one step. Eventually in
this subtraction process the contents of the accumulator will go
negative. This causes C to operate and indicates that one too
many subtractions have been performed. The last subtraction is
not counted on stepper B since its operating path passes through C«.
The operation of C causes the next operation to be an addition, since
the next ground when S releases is placed on P rather than M. The
machine therefore goes through one addition sequence (compensating
in the accumulator for the extra subtraction)* At the eighth point
of this sequence K operates and, since P is operated, the hold on
£ opens and E releases. This stops any further additions or sub-
tractions and also releases the C relay for the next division.
The stepper B will be at a level equal to the number of subtractions
{not counting the extra one) and Its position therefore is the
quotient desired. The value of this quotient is indicated on the
quotient lights which are wired to the contacts of the stepper in
such a way as to indicate in Soman numerals the position of the
wipers. This dial is cleared by pressing the Clear Upper button
whltaj operates the reset coil of stepper B.
c. e. suaekoh
April 9* 1953
|>3
TOWER OF HANOI
C. E. Shannon
The Tower of Hanoi machine automatically solves a well-known puzzle
constructed as follows. There are three pegs standing upright in a horizontal plate.
On the first peg are a number of disks of graduated sizes. The problem is to move all
these disks to the third peg subject to the rules that (1) only one disk can be moved at
a time, and (2) a disk can never be placed on top of a smaller disk.
This puzzle has been treated in the literature. It can be readily proved by
induction that with n disks, 2"-l moves are necessary. For suppose this formula is
true up to n-\. With n disks, in order to move the largest one to the third peg it is
necessary that all the other disks be on the second peg in proper order. This, by
assumption, requires 2n_1-l moves. Moving the largest disk requires one more and
moving the n-l disks from the second to the third peg, again by the inductive
hypothesis, requires 2n_1-l moves. Consequently the entire operation requires 2"-l
moves. Since the formula is true for n = 1, it holds in general. The argument also
shows how to build up a solution for any n from n-l, and hence, eventually, from the
n = 1 case.
For n = 6 (the case handled by the machine) the solution is given by the following
table.
000000
000000
100000
211111
000001
000001
100001
211112
000010
000021
100010
211102
000011
000022
100011
211100
000100
000122
100100
211200
000101
000120
100101
211201
000110
000110
100110
211221
000111
000111
100111
211222
001000
002111
101000
210222
001001
002112
101001
210220
001010
002102
101010
210210
001011
002100
101011
210211
001100
002200
101100
210011
001101
002201
101101
210012
001110
002221
101110
210002
001111
002222
101111
210000
010000
012222
110000
220000
010001
012220
110001
220001
010010
012210
110010
220021
010011
012211
110011
220022
oioioo
012011
110100
220122
oioioi
012012
110101
220120
oiono
012002
110110
220110
OlOlll
012000
110111
220111
ni 1 AAA
011000
/"\ 4 H AAA
011000
111000
222111
011001
011001
111001
222112
011010
011021
111010
222102
011011
011022
111011
222100
011100
011122
111100
222200
011101
011120
111101
222201
011110
011110
111110
222221
011111
011111
111111
222222
The first column gives the binary numbers from 0 to 63. The second column
describes the positions of the disks. For example, 000000 means that all disks are on
peg 0. The fifth entry 000122 means that the three largest disks are on peg 0, the next
smaller disk on peg 1, and the two smallest disks on peg 2. The numbers in the
second column are related in a peculiar manner to the binary numbers in the first
column and can be calculated from them. The process can best be described by an
example. Take, for instance, the binary number 010110. The following calculation i
- 3 -
performed.
+-+-+-
0 10 110
0 2 2 1 2 2
0 1 2 0 0 2
The columns here alternate + and -. The second row 022122 is obtained by summing
the first row horizontally mod 3 with + or - sign depending on the column. Thus 0=0,
2=0-1, 2=0-140, 1=0-1+0-1, 2=0-140-1+1 and 2=0-1+0-1+1-0 (all mod 3). The
third row is obtained from the second by alternately adding and subtracting the first
row from it. This row is the corresponding position of the disks in the solution of the
puzzle. It can be shown that this relation holds in general.
The Tower of Hanoi relay circuit is based on this curious relation. The machine
basically consists of a binary counter (six stages of W-Z counters) which counts from
0 to 63. Contacts on these relays are connected in a network which controls a set of
eighteen lights. There are three lights for each of the six disks, one on each of the
three pegs. At a given time, one of these three will be on, indicating the position of
the corresponding disk. As the counter proceeds through its count, the lights are
switched to indicate the process of the solution.
The circuit of the machine is shown in Fig. 1. The right hand network controls the
lights. It will be seen that this consists of a symmetric function lattice in which the
stages alternately add and subtract mod 3. The ground coming in at the bottom of this
circuit will appear in columns 0', 1', 2' according to the first number computed in the
above calculation (i.e. 0'2'2'1'2'2' in the example given). The further calculation
(012002 in the example) is carried out by the single stage mod 3 circuits attached to
the basic mod 3 lattice.
It is interesting in this circuit that when one of the larger disks is moved the lamps
corresponding to smaller disks receive their operating current through a path which is
switched. The counting process, however, is so rapid that they appear to be
continuously illuminated.
The control circuit at the left of the figure contains a three-position key switch. In
the center position, the machine stops. In the top position, it causes the buzzer B to
operate the counter and therefore proceed through the solution at about two steps per
second. When the count reaches sixty-three, the buzzer stops. If the key switch is
depressed to the lower position (non-locking), the counter is advanced one count. By
moving the switch between the center and the lower positions the solution can be
observed step by step.
Matbmanship or How to Give an Explicit Solution Without Actually
Solving the Problem
After reading several weighty papers giving formulas
which assume only prime values, I felt moved to develop a few
further results of the same type.
Theorem 1* There exists a unique real positive number X < 1
such that
e^ - £2° X] - 2[2n-1 XI
!0 if n is composite
1 if n is prime
Here Lx] means, as usual, the largest integer in x.
The value of X Is ,413 ••••
Theorem 2. There exists a unique real positive number \i < f
such that the n*" prime is given by
- IS?*1 u] - 22*1 L2^ u]
Hots the improvement over previous results - this
formula gives all the primes, not Just some of them*
For analysts who find the bracket symbol a little
suspect, we have the following:
Theorem 3* There exists a real number h such that sin 2nq is
positive or negative according as n Is prime or com*
posits.
» 2 a
Theorem 4* There exists a real number & such that
- tan 2^ 5| <^
Proofs are left as an exercise for the reader.
C. E. SHANNON
6/3/53
Bell Telephone Laboratories / ^ \
incorporated "
Cover Sheet for technical memorandum
subject: The Relay Circuit Synthesizer - Case 20878
COPIES TO:
case file ( HWB-WOB-JBF) ( BDH)
DATE FILE
AREA CENTRAL FILES (4)
MM_ 53-140-52
53-180-52
DATE November 30, 1953
author C. E. Shannon
E. F. Moore
1
-
M.
L. Almquist
2
H.
W. Bode
3
R.
Bown
4
E.
Bruce
5
A.
J. Busch
FILING SUBJECT
(TO BE ASSIGNED BY AUTHOR)
6
A.
B. Clark
7
W.
H. Doherty
8
mm
E.
B. Ferrell
Switching Theory
9
J.
B. Fisk
10
H.
T. Friis
18- C. A. Lovell
11
T.
C. Fry
19 - M. B. McDavitt
12
G.
W. Gilman
20 - J. Meszar
13
D.
W. Hagelbarger rr
D. Holbrook >vv
21- R. K. Potter
14
B.
22 - F. J. Singer
15
A.
C. Keller
v/>s.23-S. H. Washburn
16
F.
A. Korn
V^f^L- I. G. Wilson
17
¥.
D. Lewis
ABSTRACl
The Relay Circuit Synthesizer is a machine to aid
in switching circuit design. It is capable of designing two
terminal circuits involving up to four relays in a few minutes.
The solutions are usually minimal. The machine, its operation,
characteristics and circuits are described.
The Relay Circuit Synthesizer - Case 20878
MM- 53 -140-52
MM- 53-180- 52
November 30, 1953
MEMORANDUM FOR FILE
Purpose and Operation
The Relay Circuit Synthesizer (Photograph 214142)
is a machine to aid in the design of a certain class of relay
circuits. The type of circuits it handles are two-terminal
switching circuits involving up to four relays or (by simple
alterations) other two-valued elements. The desired charac-
teristics of the circuit to be designed are entered in a set
of sixteen three-position switches on the front panel of the
machine. After a period of computation, averaging about five
minutes, the machine stops and displays a circuit satisfying
the requirements. The circuit is displayed in geometric form
on a card in an associated card display mechanism (Photograph
214140). The labels of the contacts on this card must, however,
be interpreted in accordance with indicating lights on the
front panel of the machine to obtain the proper answer to the
design problem.
In about eighty per cent of the possible problems
that can be set up on the machine, the solution it gives will
be minimal in contacts, i.e., the number of contacts in the
circuit cannot be reduced. In the remaining twenty per cent,
the designs cannot be simplified by more than one contact and
may, in fact, be minimal.
The sixteen input switches correspond to the six-
teen possible states of the four relays in the circuit being
designed. Each of these switches has three positions labeled
"open," "donTt care" and "closed". If, for a given state of
these relays, it is desired that the circuit be open, the
corresponding switch is set in the "open" position. Similarly
for the "closed" position. If it does not matter whether the
circuit be open or closed in this state, the switch is set at
"don't care"# The Synthesizer takes advantage of any switches
in the "don't care" position in attempting to reduce the
number of contacts used in the final circuit. It fills in
these unspecified states in such a way as to minimize contact
requirements. This ability to handle partially specified
switching problems is one of the main features of the Synthesi-
zer and enables it to solve problems for which analytic methods
are at present ill-adapted.
- 2 =
In addition to the direct circuit designing pro-
cedure outlined above, the Synthesizer is equipped with
controls for other modes of operation. It may be run at
low speed for demonstration purposes, it may be set up to
find all the circuits in its card file satisfying the re-
quirements (not just the one with the smallest number of
contacts) and it may be used to determine various mathematical
properties associated with switching functions*
By changing the paper tape and the card file used
(but without any internal change within the electrical part
of the machine) it can be made to solve design problems in-
volving diode circuits instead of relay contact circuits.
By a still different tape and set of cards it can minimize
the number of transfers in r elay circuits instead of the
number of contacts. With suitable tape and card file, it can
solve a variety of other similar problems.
The Synthesizer represents a first step toward
machine design of switching circuits. Unfortunately, although
the method used in the Synthesizer may be generalized in prin-
ciple to circuits involving five or more variables, the time
for solution increases at an alarming rate. With five vari-
ables it would take many thousand times as long to obtain a
solution. The card file and the tape would' be about two thou-
sand times their present size and would require many man years
to construct. Consequently, a direct generalization of the
Synthesizer is hardly indicated, even with the high speeds
available in electronic computing gear.
Speed of Solution With Random Problems
An idea of the time required for the Synthesizer
to solve problems may be obtained from some tests with random
settings of the input switches. Using a book of random num-
bers, ten sets of sixteen random binary digits were obtained.
These were set up as input switch settings using 0 to mean
closed and 1 open, and the time required for the machine to
solve each of these problems was measured. The following table
gives the results of this test.
3
Binary Digits Solution Trans-
( Switch Settings) Circuit No. formation
0 0 11 #279 w* w
0 0 0 0 x1 z
0 111 y y
110 0 z* x
10 0 1 #177 w» x
0 0 10 x y
0 0 10 y z
1111 z w
10 10 #306 w z
0 0 0 1 x* y
10 0 1 y» w
0 0 0 1 z» x
0 0 0 1 #261 w z
1 0 0 0 x» w
10 0 1 y y
1110 z» x
10 10 #212 w x
0 111 x* w
10 0 1 y1 y
0 10 0 z z
Ho. of Time of
Contacts Solution
8 4min-10secc
6 lmin-10sec«
10 7min-20sec.
10 7min-7sec.
11 9min-6sec .
Binary Digits
(Switch Settings
0 10 1
0 0 0 0
10 11
1110
0 10 0
10 11
1110
1110
0 0 0 0
0 0 11
0 0 11
10 11
0 10 0
10 10
10 10
1110
10 0 0
0 111
10 11
10 11
- 4 -
Solution Trans-
Circuit No. formation
#137 w w
x» x
y y
z z
#75 w x
X z
yT y
z w
#240 w« y
XI w
yf x
z z
#193 w z
x» y
y w
z x
# 34 w x
x» z
y w
z y
No. of Time of
Contacts Solution
9 6min-32sec.
9 6min-10sec.
5 3Ssec.
# 4min-30sec.
9 5min-50sec.
- 5 -
The Solution Circuit Number refers to the Table
in MM-52-180-45, E. F. Moore, nA Table of Four Relay Two Ter-
minal Contact Networks". The Transformation indicates the
required change of variables in interpreting the numbered
circuit of this Table. The average solution time for these
ten completely specified random functions was 5 min.-15 sec,
and the average number of contacts in the solution was 8.5.
A second test was run with partially specified
random functions. Again using the Table of Random Numbers,
four switches were chosen at random for "donTt care" settings;
the remaining switches being given random "open" or "closed"
settings. This was done four times, leading to the following
results:
Binary Digits
(Switch Settings) Solution Trans- No. of Time of
D "Don't Care Circuit No. formation Contacts Solution
D 1 0 1 #334 ww 6 3min-5sec.
0 D 0 0 xx
D 1 0 D y y
0 0 0 0 z z
D 0 1 D #189 w* w 7 6min-30sec.
D 1 0 1 x z
0 10 0 y y
D 0 1 0 z x
0 1 0 D #178 w y 8 7min-25sec.
0 D 1 1 x' w
D 0 0 1 y z
D 0 1 1 z» x
001 D #58 wy 3 12sec.
0 D D 1 x w
D 0 1 1 y» z
10 11 z» x
- 6 -
The average time of solution for these problems with four un-
specified states was 4 min.-20 sec, with an average of 6
contacts.
Finally, a test was run with random problems having
eight unspecified ("don't care") states. These results were
as follows:
Binary Digits
(Switch Settings)
D=Donlt Care
0 D 1 D
D D D 0
D 0 0 1
D 0 0 D
Solution
Circuit No,
#204
Trans- No. of
formation Contacts
w
X
y
z
w
z
y
X
Time of
Solution
55sec.
0 D D 1
1 D 1 D
10 10
D D D D
#179
w y
x x
y' z
Z1 Z
6
2min-55sec,
0 0 D D
0 0 D 1
1 D D 1
0 D D D
# 5*
w y
x x
y w
z z
40sec,
D D D D
1 1 D D
D 1 1 0
D 1 0 1
# 79
w* y
x' z
y x
z w
3min-15sec,
The average solution time here was 1 min.-Sft sec, and the
average number of contacts 4»5,
- 7 -
The following table summarizes these average figures:
Completely Unspecified Unspecified
specified in 4 states in g states
average time 5min-15sec 4min-20sec lmin-56sec
average number £.5 6 4.5
of contacts
With still more "don't care" states the solution time and
average number of contacts would undoubtedly decrease still
further.
General Theory of Operation
The Relay Synthesizer deals with Boolean functions
of four variables. Each of the variables has two possible
values, 0 to Ij in conjunction there are 24 = 16 sets of values
or "states" of the variables. For each of these states, a
function of these variables can be either 0 to 1. Thus there
are 2 16 = 65,536 different Boolean functions of four variables.
It is known that these 65,536 functions can be subdivided into
402 classes or "types" of functions. Two functions are said to
be of the same type if one may be obtained from the other by
negating some of the variables or permuting some of the vari-
ables or both. Thus the function
w + x»(y+z)
is of the same type as
x» + z(w+y*)
or
wT + yfx'+z*).
All functions of a given type present substantially the same
design problem. If a good circuit is found for one of them,
it applies equally to all other functions of the same type,
for it is necessary only to relabel contacts properly and it
will represent these other functions.
- $ -
In the memorandum referred to above, circuits are
foTen*.f0r these 402 types of functions. At present writing,
331 of these have been proved to be minimal in contacts; the
remaining 71 are known to be within one contact of being
minimal. This catalog of circuits is a key part of the design
procedure in the Relay Synthesizer. S
The reader may wonder why the Synthesizer is ne-
cessary for designing circuits when such a catalog is available.
Why not merely find the circuit corresponding to the desired
function in the catalog? The answer is that it is not at all
easy to find the type or class to which a given function be-
longs even when the function is completely specified. If the
desired function is not completely specified : (has one or more
don't -care" states) there will in general be many types of
functions consistent with the requirements, and it becomes
extremely difficult to locate these in the catalog. The* Syn-
thesizer is, in fact, a machine for determining the type* of a
fully specified function and (in the partially specified case)
the possible type having the least number of contacts in its
catalog circuit,
A block diagram of the Synthesizer is shown in
Figure 1, and indicates the main functional organization. The
specifications of the desired circuit are set up on the input
switches in the right-hand box. The catalog of the 402 types
of functions appears on a paper tape in the left-hand Tape
Input box. Each function occupies six lines of tape. The
first four lines give the states for which the function is
closed. The fifth line gives the number (in binary form) of
closed states for the function, and the sixth line contains a
special hole marking the end of data relating to this function,
i.e., it acts as a punctuation mark separating functions on
the tape.
In solving a particular problem, the tape functions
are studied one by one in the machine. All permutations and
negations of a particular tape function are compared with the
desired specifications as set up on the input switches, when
an exact match is found the machine stops, and the tape func-
tion together with the permutation being applied to it re-
present a solution to the problem.
In the block diagram this is carried out as follows:
The tape function is stored in the memory relays. A permuting -
negating network applies the equivalent of the various possible
permutation and negation operations to these data. The results
of each permutation-negation operation are compared with the
input switches in a comparison circuit to see if a match has
occurred. If not5 an error signal is fed back to the permu-
tation sequencer, causing it to advance to the next permutation
operation which is, in turn, compared, etc., until all of the
3#4 possible permutations and negations have been tested. Be-
cause of short-cut circuits to be described later, the machine
frequently skips many of these, reducing the solution time
considerably.
When the set of operations on a particular function
is exhausted, the permutation sequencer sends a signal back
to the tape driving circuit, and the next function is read
into the memory for test. This signal also causes the card
display device to drop another card from its stack. The card
displayed always corresponds to the function being tested in
the machine and shows the most efficient knovn circuit for
that function.
The permutation indicator is controlled by the per-
mutation sequencer and indicates in lights the permutation
currently being tested. When the machine stops at a solution,
these lights show what permutation and negation must be ap-
plied to the circuit on the card to solve the problem at hand.
In the problems involving "don^ cares," the Syn-
thesizer could be used to successively find all of the solution,
but to use all this information in designing a circuit, it would
be necessary to compare all the circuits obtained, and see which
one is preferred. Since the grounds for preferring one circuit
over another has been taken to be economy of contacts, the ne-
cessity for this comparison step has been eliminated by arrang-
ing the functions on the tape in order of increasing number of
contacts, so that the first solution arrived at will automatic-
ally be the preferred one. Arranging the functions on the tape
in terms of any other criterion will cause the Synthesizer to
design circuits based on this criterion. If, for instance, it
is desired to design relay circuits using as few springs as
possible, or to design diode logic circuits using as few diodes
as possible, it is only necessary to arrange the functions on
the tape in order of number of springs or number of diodes,
respectively.
Circuit Operation
Figure 2 is the circuit diagram of the Synthesizer.
The layout of subcircuits corresponds roughly to the block
diagram Figure 1. We will first describe the circuit operation
in the logically 3'implest mode of operation — the normal mode
with all short-cut circuits eliminated. In Figure 2, then,5we
- 10 =
assume the mode of operation switch in the "Normal" position
N, the relay Q operated (eliminates permutation short cuts)
and the number of state switches M are set at "Normal",,
Since the Synthesizer is essentially a closed loop
system, it is difficult to find a point at which to start a
description of its operation. It is perhaps simplest to as-
sume that the machine has just finished testing one function
on the tape. The relay H may then be assumed to have just
operated locking in to the make on R . since the tape reader
will be at the division line between functions and consequently
Rs operated,, Operation of H releases the hold on the memory
relays (M^M-^ „ „ „ .M^) and also the hold on the steering
counter relays (W^Z^W^Z^W^Z^) , thus resetting this
counter to zero. It also applies voltage to the teletype
magnet which, a moment later, will pull free of the tape and
hence release R . This releases H and reconnects the holds
of the steering counter and the memory relays. It also es-
tablishes a path to the slow relay SO through its own back
contact SO*. SO now acts like a slow buzzer, producing
pulses at a rate of about six per second and relay U follows
these pulses through the SO make contact.
The pulses produced by U operate the teletype magnet,
advancing it line by line until it reaches the line with an R
hole, at which point the back contacts on Rg open both the s
buzzer circuit to SO and the teletype magnet circuit through U.
The pulses produced by U are also fed into the three-stage
binary counter consisting of three WZ pulse dividers WjqZ^,
WM2ZM2'» ^oho* Tnis countei*> therefore, keeps track of the
line of tape, counting from the last division between two tape,
functions iRs hole). This counter controls the steering trees
leading into the memory relays Mq,]^, . . . ,1^ and the number of
state relays Vl5V2,V^,Vg0 The first line of tape after the Rg
line is fed into M^M^M^M^, the second line into M^,M5,M6,M7„
the third into M^M^M^M^, the fourth into M12tM13 »M14»M1cJ
and the fifth into ^^"^Vg. A section of the tape is
shown in Figure 3.
The completion of this tape reading operation, in-
dicated by closure of Rg, puts ground on lead 106 leading into
the permutation-negation network.
- 11 -
Permuting and Negating Circuits
These circuits enable the machine to apply the
3#4 negation and permutation operations to the tape function
stored in the memory to compare it with the desired function
set on the input switches.
The negation-permutation sequencer consists of nine
WZ pairs connected in a form of counting circuit which can go
through 3#4 different states. Starting from the- iigh .speed k
(pulsed) end of this circuit, the first (6ix/WZ pairs, E, D, B,
C and A, relate to permutations and can go through twenty-four
states corresponding to the 41 = 24 permutations of the four
variables. The other four stages w, x, y, z relate to negating
the variables and can go through sixteen states corresponding
to the sixteen ways of negating four variables. In combination
this gives 3#4 states.
In the circuit, imagine Q operated, FQ and FT£ re-
leased and thatFo is pulsed, so that a series 6f pulsus is
applied to line 109. The negation-permutation ^sequencer will
then proceed through the 3^4 negation-permutation operations.
This sequence is shown in the accompanying Table I for the
first twenty-four of these, i.e., a full set of permutations.
At the twenty-fourth step this sequence repeats for the permu-
tation relays but a pulse is applied at lead 250, advancing
the negating relays one step. The negating relays go through
the sequence shown in Table II, advancing one step after the
fermuting relays have gone through a full set of permutations,
n this manner the full set of 16 x 24 combinations is ex-
hausted.
- 12 -
Table I
Sequence of Permutations
Relays
WA WB WC WD WE
(1 means operated)
Relays
A B C D
£
Permutation
W X Y Z
Becomes
____
o
0
0
0
0
0
1
1
1
1
1
W
X
Y Z
1
0
0
0
1
1
1
1
1
0
0
W
Y
Z X
2
0
0
0
1
0
1
1
1
0
1
w
Z
Y X
3
0
0
1
1
1
1
1
0
0
0
w
Y
X Z
4
0
0
1
1
0
1
1
0
0
1
w
Z
X Y
5
0
0
0
0
1
1
1
1
1
0
w
X
Z Y
6
0
1
0
0
0
1
0 1
1
1
Y
X
W Z
7
0
1
0
1
1
1
0
1
0
0
z
Y
W X
a
0
1
0
1
0
1
0
1
0
1
Y
Z
W X
9
o
1
1 1
1
1
0
0
0
0
X
Y
¥ Z
10
0
1
1
1
0
1
0
0
0
1
X
Z
¥ Y
11
0
1
0
0
1
1
0
1
1
0
z
X
W Y
12
1
1
0
0
0
0
0,
51
1
1
X
Y
Z W
13
1
1
0
1
1
0
0
1
0
0
Y
Z
X w
14
1
1
0
1
0
0
0
1
0
1
Z
Y
X ¥
15
1
1
1
1
1
U
0
0
0
0
Y
X
L W
1
1
1
1
0
0
0
0
0
T
X
Z
X
Y ¥
17
1
1
0
0
1
0
0
1
1
0
X
Z
Y ¥
Id
1
0
0
0
0
0
1
1
1
1
X
¥
Z Y
19
1
0
0
1
1
0
1
1
0
0
Y
W
X Z
20
1
0
0
1
0
0
1
1
0
1
Z
W
X Y
21
1
0
1
1
1
0
1 0
0
0
Y
W
Z X
22
1
0
1
1
0
0
1
0
0
1
Z
w
Y X
23
1
0
0
0
1
0
1 1
1
0
X
¥
Y Z
- 13 -
Table II
Sequence of Negations
Relays Relays Variables
Ww Wv Vf Vtf W I T Z W X Y Z
w x 7 z ' Become
0
0
0
0
1
1
1
1
w
X
Y
Z
0
0
0
1
1
1
1
0
w
X
Y
Z'
0
0
1
1
1
1
0
0
w
X
Y»
z»
0
0
1
0
1
1
0
1
w
X
Y'
z
0
1
0
0
1
0
1
1
w
X*
Y
z
0
1
0
1
1
0
1
0
w
x»
Y
z»
0
1
1
1
1
0
0
0
w
XT y»
z»
0
1
1
0
1
0
0
1
¥
x»
Y»
z
1
1
0
0
0
0
1
1
x»
Y
z
1
1
0
1
0
0
1
0
w»
x»
Y
z»
1
1
1
1
0
0
0
0
w»
X1
Y»
z»
1
1
1
0
0
0
0
1
x»
Y'
z
1
0
0
0
0
1
1
1
X
Y
z
1
0
0
1
0
1
1
0
X
Y
z»
1
0
1
1
0
1
0
0
X
Y»
z»
1
0
1
0
0
1
0
1
X
Y»
z»
At the end of this sequence, a ground is applied
to line 135 which initiates reading in a new function.
It may also be noted that if relay Q is released
and F16 is operated a ground is applied directly to line 250,
the input to the negating part of the counter. This will
- 14 -
cause the counter to skip a set of permutations and advance
directly in the negating sequence by one step. Operation of
Fig also releases the plus side of the permutation relays in
the sequencer, resetting them to zero. The function of F|g
is to short-cut some of the calculation in certain cases as
will be described later.
In a similar way, operation of F& with Q released
advances the and Wg parts of the permutation sequence by
one step, skipping a subset of six permutations in which
Wq, Wd and WE take part. F^ releases the plus to these three
WZ pairs, resetting them to zero. This also is used for
short :out_ purposes.
The permuting and negating relays A, B, C, D, E and
W, X, Y, Z are operated from back contacts of the correspond-
ing W relays in the WZ pairs of the sequencer. Thus they as-
sume the complementary states as shown in Tables I and II.
The function of these nine sets of relays is to interchange
sixteen leads representing the function in the memory relays
in accordance with the permutation and negation in the se-
quencer.
The logical organization of this circuit can be
represented in a symbolic form by Figure 4, which indicates
the effect of the negating and permuting relays on the variables
of the tape function, (not £he effect on the sixteen leads) .
Thus, the W relay negates the variable W when released, the X
relay negates X, etc. The A relay interchanges W and X and
also Y and Z when released, the B relay interchanges the vari-
ables now appearing (after the possible A interchange) on the
first and third lines, etc. It will be found that the twenty-
four combinations of A, B, C, D, and E produced by the sequencer
(Table I) lead to the twenty-four permutations of the four
variables as shown in Table I«
Now the circuit does not work with the four Boolean
variables but with sixteen lines representing the sixteen
states of the four variables. Negating a variable, say W,
corresponds to interchanging the eight lines (or states) for
which W is 1 with the corresponding eight lines for which W
is zero. Thus in the premuting circuit, the W negation box
of Figure 4 becomes eight reversing or interchanging circuits
operated by the relays W-^ W2, W3, W^. A similar statement
applies to the negation of the other variables and the per-
muting of the variables by the Ai} B , C, D and E relays.
- 15 -
To summarize, the sequencer can go through 3#4
states representing the 3#4 permutations and negations. The
negating-permuting network sets up the corresponding inter-
changes of the sixteen lines from the memory to the input
switches. At the memory end, these lines are given plus or
minus voltage according as the memory function is open or
closed. At the input switch end, after the permutation and
negation, these voltages are compared with the settings of
the input switches,,
There are two types of comparison circuits. The
first type, Figure 5, applies to switches Q, 7, S and 15.
It will be seen that Ffo will operate if the lead from the per-
muting network is positive and the switch is set at "closed,"
or if the lead is negative and the switch is set at "open,"
i.e., if there is a disagreement between the switch setting
and the value coming in from the permuting network. If the
switch is set at "don't care," Fk will not operate. It will
also be seen that the red and green lights will indicate
"closed" and "open" settings of the switch respectively,
while if set at "don't care" the red or green light will in-
dicate minus or plus coming in from the permuting network.
The comparison circuit for the other switches is
somewhat different. There are two relays F-^ and F2 common
to all the other switches. If a particular switch is set at
"closed," the line from the permuter goes through a diode
to F1, the other side of F1 being minus (when the test is
made). Thus F1 will operate if a plus appears on the line
from the permuter (disagreeing with the "closed" position
of the switch). If the switch is set at "open," the path
from the permuter goes through the same diode but in the op-
posite direction to F2, whose other side is connected to
plus. Hence F2 will operate if a minus comes in from the
permuter. The red and green lamps are connected substantially
as before.
Returning now to the description of the operating
sequences in the machine, we recall that the completion of
tape reading of a function into the memory was signified by
closure of R . This applies ground at lead 106 into a long
"equality chain" of contacts. This chain is closed only if
all of the W relays in the WZ pairs of the sequencer agree
in position with their corresponding Z relays. This being
true, ground is applied to the permuting and negating net-
work, and, as already described, one or more of the F relays
(FQ, F^, Fg, F^, F^, F2) will operate unless the tape func-
tion as permuted through the network agrees with the input
- 16 -
function. Assuming there is a disagreement, one at least
of F^, F^, F^£ will operate, grounding the input to the ne-
gation-permutation sequencer. This advances the W relays
of the sequencer one step in the sequence, and causes a dis-
agreement between at least one of the W relays and its
jSorresponding Z relay in the WZ pairs. This disagreement,
in turn, opens the "equality chain," releasing the F relays
which, in turn, removes the ground from the sequencer and
allows its Z relays to follow their corresponding ¥ relays.
When equality has again been established, ground is again
applied through the "equality chain" to the permuting network
and the next permutation of the sequence (now set up on the
permuting network) is tested in the same way. This cycle of
operations continues until the full set of permutations and
negations has been tested. After the last permutation, the
next ground goes through a Zw contact and the mode of opera-
tion switch to operate H, signifying the completion of tests
on the current function and initiating reading the tape for
the next function as previously described.
If, at some point, the permuted tape function
matches the input function, no F relay will operate and the
cycle is stopped. Relay J will operate and, in turn, L
through the chain of back contacts on the F relays. The
operation of L rings the gong indicating a solution, and
pulses the message register for counting purposes.
Short-Cut Operation
We now describe the short-cut provisions. If the
short-cut eliminator is "off," relay Q will release, rear-
ranging the inputs to the sequencer. In the permuting net-
work it will be seen that the lines on the zero level and
on the 15 level are not switched after the vertical column
of Z contacts, i.e., after emerging from the negating part of
this circuit. This means that if a disagreement occurs on
either of these lines, it will persist throughout all the
permutations, which only change the switches A, B, C, D and
E in this network. Hence, in case of such a disagreement it
is not necessary to test all of these permutations but the
machine can proceed immediately to the next negation saving
a great deal of time.
In the circuit, when Q is released, operation of
Fv or F,~ pulses directly into the negating part of the
sequencer and resets the permuting part to zero.
- 17 -
In a similar manner, it will be seen that the lines
at the 7 and & level in the permuter are not switched after the
B contacts. This means that a disagreement on either of these
lines, indicated by operation of Fy or F#, will persist over
the subset of six permutations in which C, D and E change*
Hence it is unnecessary in such a case to test each of these
individually and the machine advances to the next permutation
involving a change of A or B. In the sequencer, a ground is
applied at the input to the A, B stages and G, D, E stages
are reset to zero. This is done by relay Fq which will pperate
if either Fy or Fg indicates disagreement.
One further short-catting device has been incorpor-
ated in the machine. With each tape function is included, in
binary form, the number of states for which that function is
closed. As previously described, this number is stored in the
relays Vlf V2, V^, Vg, Vlo when the function is read off the
tape. On the front panel of the machine are two seventeen-
point switches labeled Max and Min. The Min switch should be
set at a number equal to the number of input switches in the
"closed" position. The Max switch should be set at this
number plus the number of "don't cares". Now, regardless of
how the "don't cares" may be filled in, the number of closed
states will be within this range (including the end points).
A function from the tape could not possibly be satisfactory
unless its number of states lies within this range. The
machine is arranged to compare these numbers and, if this con-
dition is not satisfied, to skip the function completely and
go immediately to the next function on the tape.
The comparison is carried out in the "number of
states comparison circuit". The contacts on the V relays are
arranged in the topological dual of an ordinary tree. This
implies that if the number n is registered (in binary form)
in the V relays, then all of the vertical leads labeled zero
to n at the Min switch will be connected together, but the
two groups are not connected. It will be seen, therefore,
that if the number on the V switches lies in the range covered
by the Max and Min settings, then the Max and Min swingers
will not be connected. If the V number is outside this range
then the Max and Min swingers will be connected. If the Max
and Min swingers are connected, the operation of R closes a
path to operate H and start reading in a new function imme-
diately.
It is necessary to use five relays - V-^, V2, V^,
Vrt, and V-j^-to represent all of the numbers from 0 to 16 in-
clusive, but there were only four holes readily available on
the tape for reading into these relays. Consequently four
of the relays are read into directly through the steering
relays, and a special artifice is used to get the fifth digit
stored in
Since the only case in which this digit equals 1
is when the number of states is 16, and all the other four
relays are released, this relay is operated through the back
contacts of Vlr V2, V^, and Vg in series. But since V-^, V^,
V^p and Vg are also all released when the number of states is
0, a contact of Mq is also included in the operate path, to
distinguish between these two cases.
Without the short-cutting features the average time
of solution for a completely specified function would be over
an hour; with short cuts it is about five minutes.
Indicating Circuits
A set of indicating lights is provided which shows
the permutation and negation that must be applied to the tape
function (when a solution has been found) to transform it into
the function on the input switches. The eight negating lights
are connected in a simple fashion to the W, X, T and Z coils.
If the W relay is out, for example, the W* lamp lights up by
a current through the W coil (not sufficient to operate the
W relay). If the W relay is operated, the W lamp lights up by
current through the Ww contact.
The circuit for the permuting part is more complex.
However, on tracing through. the circuits it will be found
that the lights always receive proper voltages to indicate
the permutation set up on the A, B, C, D, E relays. For ex-
ample, in the first (identity) permutation^ A, B, C, D and E
are all operated. It will be seen that the eight center
foints between pairs of lamps receive the following voltages:
0 indicates floating)
+ « 0 0
0 0 + - .
Hence the diagonal series of lamps
- 19 -
W - - -
- X - -
- - Y -
- - - Z
will be lighted. Note that the lamps connected to floating
points receive half voltage by a sneak path through the two
lamps in series. This is not sufficient to illuminate them
perceptibly.
Another permutation indicating light circuit has
been provided for trouble shooting and for better observation
of the machine while in action. This consists of twenty-five
small neon lamps. Twenty-four of these correspond to the
twenty-four permutations of the variables. These are ar-
ranged in a rectangle six wide and four high. In operation
without short cuts, these lamps light sequentially from left
to right across the first row, then across the second, etc.
In short cuts due to the Fq and F^^ relays the whole pattern
of twenty-four permutations is skipped. In short cuts due to
F^ and Fg a horizontal row in this display is skipped (only
the first lamp of the row going on).
The circuit controlling these lights consists of a
tree on relays A and B which selects the row and a second
tree on C, D and E which selects the column. Only the lamp
at the intersection point will go on. Sneak paths through
other lamps all involve at least three lamps in series and
the voltage is not sufficient for breakdown of such a series
combination.
The twenty-fifth lamp is connected to light up if
the C, D and E relays get into either of the two other pos-
sible states which do not correspond to permutations in the
regular sequence of operations. It can thus indicate certain
trouble conditions.
Other Modes of Operation
With the mode of operation switch set in the P pos-
ition (periodic), the machine does not advance the tape after
the sequence of permutations and negations but periodically
goes through the tests on the function in the memory. In this
switch position the path to the H relay, which ordinarily ini-
tiates the tape reading process, is open. This mode is some-
times useful for trouble shooting.
- 20 -
In the S position ( step~by~step) , the machine tests
a permutation and then stops until the Run switch is operated
and released. The path which normally puts ground on the relays
F^, is opened and replaced by a contact on the Run
switch connected to a condenser. When the Run switch is off,
this condenser charges, and when pressed for a step in the oper-
ation it discharges through F . F^ or Only enough charge
is stored to operate these relays once. For the next step the
Run switch must be released and pressed again.
In the L mode (low-speed), the machine operates as
in the normal mode except at a much lower speed. This is
achieved in a fashion similar to the step-by-step operation
but with the function of the Run switch replaced by relay N.
The N relay is operated by the G relay which is connected in
a relaxation oscillator circuit using a gas tube. The conden-
sers charge up sufficiently to break down the gas tube which
operates G, closing its make contact and discharging the con-
denser which then starts recharging. This slow oscillation
of G causes N to oscillate slowly which, in turn, allows the
solution to proceed at a slow rate.
In Mode Q ( self- restarting) , the machine does not
stop at a solution but rings the gong, pulses the message
register, and then proceeds to the next permutation or nega-
tion in the sequence. When a solution is reached in this mode,
the operation of relay L causes the message register to operate.
This releases relay £ which releases the message register and
also applies voltage to slow-operate relay G. Operation of G
energizes N, which in turn advances the permutation sequencer
one step and also energizes K, K locks in releasing G and in
turn, N, and the solution proceeds. This mode of operation
can be used to find all of the solutions to the given problem,
rather than just the first one.
C. E. SHANNON
E. F. MOORE
Att:
Appendices A and B
Photographs 214140 through 214143
Figures 1 through 5
- 21 -
Appendix A
Main Components and Their Functions
Relays and Other Electromagnetic Components
M_ M
1»
M
15
wx, w2, w w4
V
\
J
2*
T4
V
V
Z ,
3
\
*!•
A3,
\
Bl-
B2,
B3'
B4
°2.
°4
Dr
V
V
D4
V
V
V
\
Vw WxZx
wzZz> Waza
WcZc» wdzd
Vy
wbzb
veZe
Memory relays. These register the
values of the function read off the tape
for its sixteen possible states. If M.
is operated, the function is closed in
state ie
Four parallel relays (to give sufficient
contacts). These relays negate the vari-
able ¥ of the tape function. This is
done in the negating and permuting net-
work by interchanging the eight leads
corresponding to the variable W=l with
the corresponding eight leads for which
the variable W is zero.
Similar negating relays for the variable
Similar negating relays for the variable
Y.
Similar negating relays for the variable
Z.
Permuting relays. The function of these
relays is to permute the sixteen lines
from the memory relays according to the
various permutation of the variables
W, X, Y and Z in the tape function. By
suitable combinations of operation and
release of these five sets of relays,
the interchanges corresponding to any
of the twenty-four permutations are pos-
sible.
WZ relays arranged in a counting circuit
to go through the 384 permutations and
negations applied to the sixteen leads
in the permuter. These WZ; pairs control
the preceding W, X E relays, thus
W1, W2, Wj, are controlled by the
relay of the ¥w pair.
- 22 -
Appendix A (Continued)
F0» F7» Fg» F15 Failure relays. Operation of FQ, for
example, corresponds to failure of the
permuted line coming into switch 0 to
match the value on input switch Iq.
Operation of a failure relay causes the
machine to proceed to try another per-
mutation or tape function.
F!> F2 These are failure relays which are op-
erated by a failure to match on any of
the other switches not taken care of
specifically by FQ, F7, Fg or F^.
F3» Fg» F-i6 Secondary failure relays. These are
y operated by the preceding failure relays
and sort out the type of short cut (if
any) available. F^ causes the permuter
to advance to the next negation (skipping
all permutations of the current negation)*
F^ causes the permuter to skip the current
subset of six permutations out of the
twenty-four, advancing the AB part of the
permutation one unit. F^ causes an ad-
vance of only one in the permutation.
a0» Ri» R2' fi3» Rs These relays are controlled by the five
fingers of the tape reading mechanism.
For example, a hole in the 2 row of the
tape operates R2# Rq, R^, R^, R~ carry
information to the memory relays Mq,
and also to the number of state re-
la7s Yi> v2» V4» YB° Es marks the end
of data relating to one function on the
tape .
Sl* S2* S3* SL Steering relays. These relays steer,
by means of four trees, the tape read-
ings on Rq, R^, R2, R3 into the memory
relays and the number of state relavs
Vl> V2» V V
- 23
Appendix A (Continued)
V»r \2z-2. S^^^fi^rSUlHr.
sequence the steering for successive
lines of tape into the appropriate
memory and number of state relays.
V,, V"2, V , Vg, Vl6 Number of state relays. These relays
* register in binary form the number of
states for which the function currently
in the memory relays is closed.
WSZS A WZ pair for operating the card dis-
play unit. It causes successive func-
tions on the tape to operate alternately
the right and left solenoids Sr and S,
of the display unit.
Sr, Eight and left solenoids of the display
unit for releasing cards one by one
from the stack.
H End-of-permutations relay. This oper-
ates when the machine has tested all
permutations of the current tape func-
tion, and initiates analysis of the
next function on the tape.
I» Success relay. This operates when the
machine finds a solution to the prob-
lem.
Q Short cut eliminator. "When operated,
this relay eliminates short cuts in the
premutation sequence.
J A delaying and checking relay in the
basic closed loop of the system. J
operates when all of the WZ pairs in
the permutation counter are in agree-
ment.
SO Slow-operate relay in a buzzer circuit
for producing pulses to step the tape
via relay U.
U Secondary relay operated by SO.
- 24 -
Appendix A (Continued)
Reed relay in a slow relaxation os-
cillator circuit for controlling low-
speed operation via secondary relay H.
Secondary relay controlled by G.
Control relay relating to low-speed and
self -restarting modes of operation.
Message register for counting solutions
to a problem.
A relay for connecting the 110 volt
supply only when the 24 volt supply is
on.
A bell operated by L which sounds when
a solution is found.
A five-hole teletype tape transmitter.
The standard functions are arranged on
tape in order of increasing numbers of
contacts.
Appendix B
Manually Operated Switches
Problem input switches. These switches
have three positions, "open," "donTt
care," and "closed," and are set to cor-
respond to the desired characteristics
of the circuit to be designed in its
sixteen states.
Mode of operation switch. This is a
five-position switch which determines
the mode of operation of the machine.
In clockwise order these modes are:
-
P = Periodic. It continues cycling
through the same permutations with-
out advancing to the next function.
Q = Step-by-step. In this mode the
machine tests the permutations one
at a time under control of the key
switch. This switch must be pressed
once for each permutation.
N = Normal operation. Runs at regular
speed to the first solution and then
stops.
S = Self -re starting. At each solution,
it rings the gong and adds a count
to the message register, and then
advances to the next solution,
L = Low-speed. Similar to normal, but
at low- speed for demonstration and
test purposes.
Short cut eliminator. In the "On" po-
sition this switch operates relay Q
and eliminates short cuts in the per-
muting sequence.
Next function button. Pressing this
pushbutton operates relay H, causing
the machine to advance to the next
function on the tape, omitting any re-
maining permutations of the current
function.
- 26 -
Appendix B. (Continued)
Starts the machine operating by
closing its fundamental operating
feedback loop.
Turns power on for the machine.
Both of these switches have seventeen
points labeled, 0, 1, 2, 16; the
Min switch has an additional point
labeled "Normal". In use, the Min
switch is set at the number of states
for which the function to be designed
is closed. The Max switch is set at
this number plus the number of "donft
care" states. The machine then skips
functions from the tape whose number
of closed states do not lie in this
range, thus shortening the solution
time. If the Min switch is set at
"Normal" this shortening feature is
eliminated.
April 3, 1954
Both experience and intuition suggest that a function
of time f(t) which is bounded in amplitude range ( |f (t) |<A) and
in bandwidth (the spectrum vanishes for angular frequencies
etCo, and that there is a certain minimum time required to go
from a maximum negative to a maximum positive amplitude. In-
deed, one feels that the maximum slopes, and higher derivatives,
and the fastest rise times will occur with a sine wave having
the highest allowed amplitude and the highest allowed frequency.
This note establishes some theorems of this general sort.
Theorem I: Let the function f(t), of integrable
square, be both amplitude limited and band limited:
|f(t)|<A all t
greater than <aQ) has bounded slope, a bounded second derivative,
F(») - 0
where F(«) is the Fourier transform of f(t)0 Then
f»(t) < A«0
f"(t) < A«0
2
all t
f^t) < Ao)0]
n
2
Proof; If we can prove the theorem for a particular t,
it will follow for all tr since we can shift f(t) along
the time axis without affecting the assumptions of the
theorem or its conclusions. We will prove the theorem
for the particular time t^ - Now apply the sampling
theorem of f(t), expanding it in terms of its samples:
f(t) - 2 aj, sin Sa£
-oo <i)0t-nn
ft(t) m °P ;[<o0((o0t-nn)cosco0t - <o0sinco0t]
-oo 2
(w0t - ntr)
■
since the absolute value on a^ makes all terms positive.
Now is the value H£ f (t) at t - §2 ^ consequently
o
l^l 5 A» Hence
o ~ **{n-l/2)<
±Zfl 2 1
(n-l/2)2
This proves the desired result for the first derivative.
The results forl.higher derivatives can be obtained
inductively, f» (t) is band-limited, of integrable square, and,
as we have just shown, amplitude limited to AaiQ, Hence, f"
will be amplitude limited by:
f»{t) < (Awo)<o0 - Ao)Q2
and by obvious induction
f<n)(t) < A£0o*>
It will be noted that these bounds are the maximum
derivatives that would be obtained for a sine wave of the
highest allowed amplitude and frequency, f(t) « A sin o>ot.
While such a wave does not satisfy our integrable square as-
sumption, it is possible to approximate the bounds given as
closely as desired by taking a sine wave of nearly top fre-
quency and nearly top amplitude and multiplying it by a very
slowly decaying function of the type s***kt (k very small),
let
This produces a function satisfying all the conditions with
maximum derivatives approximating to the upper bounds given.
Consequently these bounds are the best possible.
-
We now consider the problem of total rise of a function
over an interval. Again we would conjecture that the shortest
time for a rise from negative peak to positive peak amplitude
would be obtained by use of a sine wave of the greatest allowed
frequency and amplitude and hence would be ntoQ seconds. We have
not been able to prove a result quite this good but will show the
following:
Theorem II: Under the same conditions on f (t) as in
Theorem I, it takes at least 3 1/12 w0 seconds for f (t) to
change from -A to +A.
Proof: We will show that if f(o) - -A, and f(t3) - +A,
then f1 it) for 0 < t < t_ lies always under or on the
~ ~ 3
curve g(t) shown in Figure 1, This curve consists of
2
five sections, a straight line segment of slope Au3Q , a
parabolic segment whose second derivative is -Aa)0^ and
which is tangent to the first segment and to the third
segment, a horizontal straight line at height Ao)Q. The
last two segments are reflections of the first two.
In the first place, if f(o) - -A, then f'(o) - 0,
for f (t) is an entire function because of the band limita-
tions, and if £} (o) were not equal to zero, f(t) would run
outside its amplitude limit A in the neighborhood of zero.
Now
t
f»(t) - f»(o) + J f"(t)dt
< 0 + J |f«(t) |dt
t
< Aw02 dt - A»02t .
- 5 -
Hence f 1 (t) lies under or on the sloping straight line
section. Also f»(t) < AVjj^so it lies under the horizontal
segment. Next we show that it cannot lie in the small
triangular shaped region T. Suppose in contradiction
that f 1 (t) did lie in this region, passing through a point
p at t - t as shown. At tQ we have either (A) f"(to) > gT(t0)
or m f°(t0) < g'(t0).
Assume first case (A). We may write
t2 t2
f»(t2) - f'(tQ) + (t2-t0) f»(t0) + J I f«'(t)dt dt. (1)
*o *o
We also have
t2
g(t2) - g(t0) + (t2 - tQ) g«(t0) + J J g»(t) dt dt. (2)
The three right-hand members of (1) dominate the corres-
ponding members of (2). f»(te) > g(t0) since we assumed
f»(t0) in the triangular region. f?(t0) > g»(te) since
we are assuming case (A). fm (t) > g"(t) since the g curve
has the greatest negative second derivative allowed by
Theorem I. We conclude that f'(t2) > g(t2), and the f»
curve is over the horizontal line at t2, a contradiction
which excludes case (A).
A similar argument applies to case (B) working back-
ward to the point t^« In equations (1) and (2), read t±
for t2 and notice that the coefficient (t1-tQ) now becomes
negative • This allows the same argument to go through with
the condition reversed on the relation of f"(t0) and gT(to),
and the resulting contradiction excludes case (B), which
shows the impossibility of a curve in the triangular region.
An exactly similar argument working backward from t0
shows that f»(t) must lie under or on the right-hand sloping
line and curved segment. Now if f»(t) is always under g(t)
under gH). In order that f(t) run from -A to 0 to +A at t^
the area under f « (t) must be at least 2A and hence so must that
under g(t). A simple integration of the g(t) curve shows that
this requires t3 > 3 1. This proves the desired result.
It would no doubt be possible to improve the value
3 ^ by more elaborate arguments of the same general type,
finding better g(t) functions with properly banded values of
gm (t), giv(t), etc. It seems difficult however to obtain the
conjectured value by this method,
C. £. SHANNON
Fig. i
e.f -5.
Bell Telephone Laboratories
incorporated
Cover sheet for Technical Memorandum
subject: Concavity of Transmission Rate as a Function of Input
Probabilities - Case 2067o*
COPIES TO:
CASE FILE
DATE FILE
AREA CENTRAL FILES (4)
i - HWB-WOB-JBF
2 -
H.
W. Bode
3 -
W.
R. Bennett
4 -
H.
S. Black
5 -
c.
A. DeSoer
6 -
E.
N. Gilbert
7 -
R.
E. Graham
3-
D.
W. Hagelbarger
9-
J.
L. Kelly
10-
S.
P. Lloyd
11-
L.
A. MacColl
12-
B.
McMillan
13-
E.
F. Moore
14-
J.
R. Pierce
S.
0. Rice
16-
D.
Slepian
mm- 55-114-23
date June 3, 1955
author C. E. Shannon
FILING SUBJECT
_MUS£1£NED BY AUTHOR )
JTH'S COPr f0R
Information Theory
ABSTRACT
The following theorem is proved: In a discrete
noisy channel without memory the rate of transmission R
is a concave downward function of the probabilities P^ of
the input symbols. Hence any local maximum of R will be
the absolute maximum or channel capacity C.
Concavity of Transmission Rate as a Function of Input Probabilities
- Case
MM-5 5-114-23
June &, 1955
MEMORANDUM FOR FILE
Theorem:
In a discrete noisy channel without memory, the rate of
transmission R is a concave downward function of the probabilities
Pi of the input symbols. Hence, any local maximum of R will be
the absolute maximum or channel capacity C.
Proof; We have
R = B(y) - Hx(y)
- -2 QA log Qi + 2
where the Q.^ are the probabilities of the various received symbols
and a£ is the conditional entropy of the received symbol when the
transmitted symbol is the i-th one.
A condition for concavity of R is that — = R..
be a negative semi-definite form.* We have
|f - -f 1 ♦ log 9i) PjU) ♦ a}
J
using the fact that Qi = Zp^p^i).
H<v - ~Z - i p,(i) p fi)
*See "Inequalities, " Hardy, Littlewood and Polya, Cambridge 1934,
p. SO.
2 R AP.AP. = -2 2 Ip.(i) p (i) AP AP
jk J J 1 ijkQi J k j k
,-2^(2P.(i)AP.)(Z pk(i)APk)
(1)
- j£Si.
iQi
This displays the sum as necessarily non-positive, since
all terms are non-positive, and consequently shows that R^k is
negative semi-definite and R a concave function. The simplicity
of the formula (1) for the second derivative of R in an arbitrary
direction is quite striking.
A corollary to this result is the following! Consider
the set s of points (Plf ?2, PQ) with 2Pi - 1 for which R
has its maximum value. Normally, of course, there is only one
point in the set, but in other cases it is not so limited. Our
theorem allows us to deduce that s is always a convex set of
points, for if R is maximized at (P^, Pn) and also at
(P», Pf), it must clearly have the same value at (aP +
in -1-
(l-a)PJ aPn + (l-a)Pjr).
C. E. SHANNON
so
STEERING
CIRCUIT
^■S-tHH-
~°ws j wiT
1L
TnT
CARD DISPLAY CIRCUIT
MEMORY RELAYS
° 1
T T
TELETYPE CONTACTS
{Eh *
KEh *
-Ehi-
TAPE INPUT CIRCUIT
Run
SCE
NUMBER OF STATES
COMPARISON CIRCUIT
1 3 a
9 - 5 , ,, 11 9
?9«JO»'io'
ZM, Rj Z«!
PERMUTATION- NEGATION CIRCUIT
INPUT SWITCHES
SHORT
CUT
EUMJNATER
LAMP INDICATIONS
110 v DC
NE 2
NEON LAMPS
77w
15,
"5] "51 ol "5] "5
eV\e
o
NCOH LAMPS FOR INDICATING SEQUENCING
CIRCUIT OF SYNTHESIZER
Fig. 2
B- 362338
in
o
a.
In
5w
c
C
»/>
o
o J-
c
^ c
d .2 c
O
O- 'Z
— •< —
u i; a)
Lam
dica
3 d u
E o1.-
u <y O
QJ Z
CD
c
CL
a.
H
CP
TI T.*., Tune. 30,H5^}
B- 362340
*- o
0-.C
QT CC or , or cr
nun
o o
o
o o
o
o o
ro
*
CP
8I°H
£ >< > £J
co
X
X
X
X
®
0
0
0
£ X > N
cr
• — i
Ll.
"D.T ft., Tune 30, \S54
8-362745
" &
OPEN
FROM
1
»
DONT
PERMUTER
DONT CARE
CARE ' - '
CLOSED*
©
TO +■ WHEN
OPEN S TESTING
•
• « m
CLOSED
\T0-
WHEN
TEST I N6
Fig. 5
D.T.A., June $0,145*4
Of]
A SKELETON KEY TO THE IBFQRKftTION SEMIHAB - gOTES
The material in these notes has not for the most part been
published and is for personal use only. The notes are not complete.
Several key sections are not yet available, consequently there are a
number of forward and backward references which are quite meaningless.
The remaining sections will be handed out as soon as avail
The parts of the notes now available are not arranged in the
correct order for easiest reading. The following rearrangement of sec-
tions should be made:
Some Useful Inequalities for Distribution Functions - p. la - 3a ^
A Lover Bound on the Tall of a Distribution - p. ly - 9y u-^
A Combination Theorem p. lm I —
Some Results on Determinants p. lb - 3b
Upper and Lower Bounds for Powers of a Matrix with Hon-negatl,ve Elements
The ffumber of Sequences of a Given Length
Characteristic for a Language with Indepedent Letters
The Probability of Error in Optimal Codes
Page with figures 1, 2 and 3
Zero Error Codes and the Zero Error Capacity p. I4- 6g ^
Theorem p. lh - 3b. U<-
Figure 4
Lower Bound for Ppf for a Completely Connee^* Ch«nn?T yi^
p. 2r - 3r
ad for f& p. lk - 5k
Application of ■Sphere-packing" Bounds to Feedback Case - p. lp - 3p
Theorem p. lq - 4q^
Theorem p. 1J ^
A Result for the Hemoryless Feedback Channel p. ir \^
Continuity of Pp ppt as a function of transition probabilities - p. le
Codes of a fixed composition p. If
Relation of P^ to n . It - 2i
BpUBl or Pg for Random Code by Simple Threshold Argument - si - eki^
A bound on Pe for a random code p. Id - 3d ^
- 2 -
The Felnstein Bound pages 11 & 21
Relations Between Reliability and Minimum Word Separation - p. l2 ( 22 , 62 & 72
Inequalities for Decodable Codes p. In - Jn
Convexity of Channel Capacity as a Function of Transition
Probabilities p. lo L*-"
A geometric Interpretation of Channel Capacity p. lx - 6x ^
Log Moment Generatin Function for th» Sqpm-e of a
Quassian Yariate p. p 1 - £2 L-
TTppar Bound oix for Gaussian Channel by Expurgated
' Random Code p. si - f2
Lower Bound on P^ in Gaussian Channel by Minimum
Distance Argument p. al - a2 "
The Sphere Packing Bound for the Gaussian Power
Limited Channel p. c 1 - e 5
The T-terminal Channel p. .fl - 67
Conditions for Constant Mutual Information p. 1066
Simple Proof p. 1024
The following errata have been found:
p. ly line 10 > 1
line 11 for any positive <^
line 14 ^(1 - ep"-
p. 2y line 8 V, <Y2 <. . . . 7%
p. Jw - lines 1. 2. 4, 7, 8, 9, 13. 17 subscripts on $ should
be in line.
1
p. 2c - line 7 * log Prob
n p
4c - Eq. (7) E(8) - -^(s) log - - (ji - su«)
Eq. (8) R(s) = £^(8) log qi(s)°1 » n - («-l)
line 6 dE , dR ^ - n' + six" + n' - . s
ds ' ds * n1 + (1-s) uM-u' ~
line 2 E(l) « j log p^1 + log d
- 3 -
page 3| - line 3 - log min. jT
page J*g - line 9 change mar. to min.
Fig. 4 bottom line - change 3 to 2.
page 5K equation (l) min
V 1°U = 1
I would appreciate knowing any further errors of any sort that
are found in the notes. I expect there a good many there. I wculd
also be interested to know of any parts that are particularly difficult
to follow and perhaps need rewriting.
Claude E. Shannon
0*fj
Bounds or- the Teiis of Martingales and delated Questions
Claude B. Shannon
Department of Electrical Engineering
Department cf Mathematics
and
Eeseareh Laboratory of Electronics
Massachusetts Institute of Technology
Cambridge, Massachusetts
This paper is concerned with the problem of overbcunding the proba-
bility that the sum of n dependent random variables exceeds a certain
quantity. Certain restrictions are assumed concerning the distribution
of the ith random valuable :n conditional on the preceding random var-
iables. As an example, v;e might have a gambler plgying some * system K
in v/hieh m is his winning cn the ith bet. Suppose he can choose any
distribution he desires for xi conditional on the preceding plays,
-"'^i i 3:j-~ " " i-Z' subject however to the conditions 1) it is a
KfairK bet, S(x. !x.._, , . . . , r^) = Oj 2) there is a Rhouse limit" on
passible wins or losses for one bet, . . .,x, ) = 0 for
< L and Pixja^, n^gt . x^ * 1 for sc.. S> W. It is desired tc find
an upper bound on the probability that the gambler's winnings will exceed
a certain limit X after n bets. This bound will of course be a function
of Le Y:s n and K but is to be independent of the system used.
Thought of another way, we can imagine the gambler mapping out a
strategy, subject to the house rules, to try to maximize the probability
I
of ending up after n bets with a total winning of X or more. If this is
his object, he would clearly be wise, for example, if he ever reached
the level X to not risk any future loss. This he could do by choosing a
distribution function thereafter which is 0 for negative s and 1 for
positive s.
We will find a bound for this problem and various other similar
problems with different side constraints on the allowed distribution
functions. The results have applications in various problems related
t- random walks, gambler's ruin problems and certain coding problems
in information theory.
In the example above, the gambler's total capital forms a martingale
because of the Rfair bet" condition. Bounds on the tails of martingales
are known in terms of the variances of the successive amounts won.
The bounds we obtain are in terms of conditional moment generating
functions. As such, they require more in the way of restrictions on
the distributions' (for the moment generating functions to exist), but
give tighter bounds. Our bounds bear the same relation to the variance
type bounds for martingales that the Chernoff bound does to the
Chabycheff bound for sums of independent random variables.
The Main Inecuality
The method we use is based on a bound for the tail of a distribution
due to Chernoff^'. Lei P(x) be the distribution function of a random
res
eS::dP(x)
exists ever some % interval including the origin in its interior. This
2
will certainly b'e true, for example, if P(x) < eE:: for some a > 0 and
sufficiently large negative x, and 1 - P(s) < e for Some positive b
2nd sufficiently large positive x.
We first derive a somewhat generalized formulation of the Cherncff
bound. Let u(s) * log v(s) be the semi-invariant generating function.
Lemma 1; Suppose the semi-invariant generating function {i(s),for a
random variable x, exists for & < s < b and does not exceed another
differentiable function of st ^(s). Thus f/.£s) * !-Us). Then
fi (s)-S:-f{s)
Pr[:^r,ys)l « e ° " ° b^s>0
r-r[;:^(Sjj <s e ° ° -e « s « G
This result is like the Chernoff bound except for replacement of u(s}
by an upper bounding function ^(s), and may be proved by similar means.
Thus by the generalized Chebycheff inequality
sy /*cc
e
XPr[x5*X] « f " eS"dP(x) s * 0
: f°° c-sxdP(x) = v(s) = e^s}
v-00
Ms)
*e °
his is true for any X. Set X = h£(s). Then
e °
A similar argument gives the dual inequality for negative ».
We now develop a formula for the momeat generating function of the
sura of c set of dependent random valuables }x = X] * ^ + . . . f ^ ,vhere
the distribute function of r_,, ...,Zr is given by
P(z I' V ' ' * ' *n} " F*i^2. s2*«y .... s.r<aj
It is cs assumed that for this multivariate distribution the moment gen-
err irzZ functus for various random variables conditional on others
euisi. To avoid notations! eomp-emty we carry out the only
for n - *, using ;:, y and * for the three random variables, but the
method is clearly general, id v(s) is the moment generating function
for the sum variable u « s + y * 2, then (all integrals are from -co to »);
= / eS:: dP(r) J dPV|^3 j* eSZ dP(s|Xj y)
The innermost integral is the moment generating function for s condi-
tional on s and y. and may be denoted by v^.y) (the 3 referring to
the third variable, z). Thus
Suppose now that we have a bounding function for ^(efx.y), say Y (s).
4
independent of x and y.
v3(s|x,y)< Y3(S)
Then the innermost integral may be bounded by ^(3) and .Ms term
taken out of the integration. is ciearly non-negative. being an
expectation of eSz.) Thus
Ws)^v3(s) J eSidP(:0 JeSydP(y[x)
Similarly, suppose the moment generating function of y conditional on
x is bounded by y (s)
v2(s|x)= j"e^dF(7Jx)^ Y2(S)
and the moment generating function of x is bounded by Yl(s)
eSx
dP(x) < Yl(s)
Then these may also be ,sed to bound the integrals, giving
WiJ « Yj(s) v2(sj y3(s)
Taking logarithms the semi-invariant generating function u(s) for '
the sum variable u is therefore bounded by the sum of the logarithms
of the v<s) functions, iiat is, by uniform bounds on the conditional semi-
invariant functions fo the different variables
l4s) £ ^(s) t ,i2(s)+ ^(s)
5
The same argument carries through for the sua of any number of ran-
dom variables and may be summarised as follows.
Lemma 2: The semi«invariant generating function jj.(s) for the sum
of n random variables is bounded by
where ^(s) is a uniform bound on the Semi-invariant function for the
ith random variable conditional on the first i— i;
f sx.
log J e 1 dP(x. |slf s2, .... s^j) * (j..(s) .
In most applications the same bound, say p.Q(s)s will apply to all
the random variables. In this case ^(s) <S ntiQ(s). Combining Lemmas
1 and 2 we obtain our first main result, a bound on the tail probability
of a sum of dependent random variables provided the conditional moment
generating functions exist.
Theorem 1: If u is the sum of n dependent random variables
Xj(i*l, Z, . . . , n) whose semi -invariant generating functions conditional
on preceding variables n^sjxj, .... exiist and are bounded by dif-
ferentiate functions ^(s), (i=l# 2, . . n) then
Pr[u*Su|(s)j « e 1 1 s * 0
Sti.(s)-sZ}i'(s)
Pr[u«2|^(s)3 ^ e x 1 s < 0
6
Applications
In applications of this result we would generally attempt to find the
smallest bounding functions ^(s) in order to obtain the tightest bound on the
tail probability. As a first example consider a gambler allowed to choose
a wager with an arbitrary distribution function ctfx) (the probability of
gaining x or less), subject however to the following conditions:
1) The expected gain is 2ero. J" xd$x) ~ 0
2) <Kx) =s ^(x) where ^(x) is a distribution function with negative
mean for which J%Sx d^(x) exists for some negative s.
3) <Kx) 5> <>2(x) where <j>2(x) is a distribution function with positive
mean for which /esx d<|>2(x) exists for some positive s and ^(x) < ^(x).
Thus our gambler is allowed to choose a distribution function at
each wager lying between two given curves ^(x) and ^(x)t (as suggested
by Fig. 1)
Fig. 1.
which approach 0 and 1 with a certain rapidity. He is also constrained
7
to choose a distribution function with zero mean. The situation described
earlier involving house limits is a case of this type where the distribu-
tions $j and 4>2 are step functions at L and W, the maximum allowed
loss or win per wagar.
To apply the theorem we need a function which bounds the moment
generating functions which he can achieve with these restrictions. Con-
sider the distribution function A (s) defined as follows:
t>G(") ~ $j(x) x < a
<?>0(x) ■ k a =S ^ p
♦DU) = 42(x) x > p
where a is the first point at which ^(x) reaches the value k and (3 is
the first point at which <J>2(x) reaches k. tfx) is a distribution function,
and by adjusting k we can clearly make the mean of the distribution <j>(x)
equal zero. With this value of k we will show that the moment generating
function for any allowed ${x) is bounded by that for A (x).
Since $(x) and <|>o(x) have the same mean (namely zero), we have,
integrating by parts,
o = f x d(*o(*H<*)) = ^(xHKx))]00 - f°° 4 (x)-cKx})dx
-00 e/~co \ /
dx = 0
where we use the exponential approach of * and 6q to 0 and 1 as x goes
to -co and -fco to insure the vanishing of the term 4*0UH(x» at these limits.
8
Now consider the' quantit-f fa~H« „c.
q 2 - Us -in using integration by parts)
£ °" «*«HM> - - • f e- to
A 0
-a « s « b
a md b ^e shs iimi££ of ^^^^^^ ^ ^
'unctions an, a is *. «rSt paiat 8t ^ ^ ^ ^ ,
horizontal se^nt of ^, ^ ^ foy ^ ^ ~ ^
;7tly* (w — v?- £- »>•• - « u (or ,.ro,
I he first terrn -s / «,s*r.L r i , „
. r6 J-o is greater than or equal to
_ A* S **J*H<*ndx. since, when s is positive, e*5> es* for
< x < 6, $ - $ is positive and the coefficient
-Y UJCien£ s » negative. If s
posltlve. fa „ stoUar way_ lhe aecQnd .
* neater than or equal l0 _3 p ^ " " J6 ^ ^HMl <*
J6 IV1'-^2)! as one verities by
examxnation of the two cases s » o and s < 0 „ „ •
Q s * remembering that 4 (x) -
» native or ,.ero in this range. Thus we concha °
= -se5s r
cs
e6S[*0(xh«(s)J dx
■se
a 0
•'-co
9
In other words, the moment generating function for the distribution
6 (x) dominates that of any other distribution with the same mean as <j>
and bounded by the ^ and <j>2 curves. Therefore the moment generating
function for A may be used in our bounds for the tail of a sum distribu-
tion if the individual conditional distributions satisfy this type of restric-
tion.
Using this bound on the conditional moment generating functions in
Theorem 1 our solution may be summarised as follows. Suppose at each
play of a game the distribution functions available to a gambler all have
zero mean and lie between two functions 6j(x) and d>2(x). Let 4>Q(x) be
the zero mean function consisting of 4>j followed by a flat segment,
followed by 4>2. Let
yis) = log J° e3x d $o(x»,
Then the probability of his winnings after n wagers exceeding n{x»(s) is
Pr[u»nu«(s}J < en[fi(s)"s^(s)] s » 0
This same bound applies, of course, also with a semi-martingale
condition, that is, if the gambler's expectation is only required to be
non-positive.
If 4^(0) ■ 1 and <j>2(0) = 0 (so the gambler can play a wager that amounts
to stopping the game, that is, a distribution which is a unit step at zero),
then this same bound applies to the probability of exceeding nn'(s) on any
of the first n trials. This is because the bound covers ail strategies.
10
Any particular strategy could be modified so that if the gambler reaches
the level nfi'(s) at any time before the nth trial he then effectively holds
his winnings by playing the distribution with unit step at zero. The bound
must exceed the probability of exceeding the level njxs(s) for this strategy
at the nth step but this is a bound on the probability of ever exceeding the
level in the first n steps. This device can be used in many applications
of the method we ar? describing, provided only that the unit step at aero
is an allowed distribution function.
The bound given, while certainly not the best possible for all values
of the parameters, is, however, best possible in the coefficient of n in
the exponent. That is, the result would be false if u<s) - su»(s) in the
riyht hand exponential term were replaced by \i(s) - sjj.6(s) - € for any
positive €. This may be seen as follows. The gambler could, within the
rules, choose the distribution $o(x) at each wager. If he does so, then
we have a sum of n independent random variables, each with semi-
invariant generating' function u(s). Lower bounds on the tail of this sum
distribution are known to exceed ^rf'Hu^H] when n is sufficiently
large.^
The Case with House Limits on Win or Loss for each Wager
For the case of the gambler who can choose an arbitrary distribution
with sero mean and house limits on wins and losses W and L (L<0)
respectively, the distribution to maximize ji(s) is, from the above analysis,
a binomial distribution with jumps at the ends of the interval W and L
adjusted to give a zero mean. The two probabilities are WWT at L and
11
To gain a little in generality and siinpiixy notation, consider a binomial
with probability p at values L and probability q * {l-p) at W. The semi-
invariant generating function is
n(s; = log (pe?L*qesW)
_T sL , „. sW
pLe + qWe
uH&\ =
pe * qe
The expression for the bound on the tail may be simplified by a change of
variables eliminating s. Let
nas>L
pe
X a
peSL + qesW
qe
t] 1 - \ =
sL sW
pe * qe
Then
A * L cs{L-w,
i q
i *q
H'(s) = XL * t]W
12
u - ap?(a) = log (pesL*qesW) - s(XL^W)
= log (pea^qesvv)-
L - W lQg pi.
p q
= X log ♦ 11 log —
Xq (XL^W) Xq
Letting p equal ^— and q equal ^rx" and using our result bounding
the tail of the sum of n random variables, we obtain the following bound
for the probability of the gambler exceeding a certain level after n wagers;
W (IF f
Pr[u»n(XL*T|W)] <c
W - L
X » pi tj = 1 - X
If L = -W, that isr the win and loss limits are the same, this formula
can be simplified somewhat at the expense of a certain weakening. It then
becomes
Pr[u»nW(l-ZX)3 <
"-X -Tfn
X ti 1
Let x* |(i+e}f n = -ki-e).
Then
Pr[u>nW9] * [(Hwef(1+e)(1-e)-(l-e)]n/2
-|[(l+e) In (116)1(1-8) In (1-6)]
83 e
13
Consider the bracketed term in the exponent and expand the logarithms
as series.
[(1+0) In (lfe)-f (1-9) In (1-8)] * (l*0)(e - ^ * ^ - + )
\i o,y a 2 3 4 . . „y
e4 _ e° \
*\ 2 4 6 " "7
,f9z , e4 . e6 ,
Q2 e4 e6 e2n
' b ^ 15 ^ °°° ^ 1i(2n-l} *
*02
Hence
-ne!
Pr[u*nW9] « e 2 9 Ss 0
It may be noted that this bound is similar to the exponential part of
the normal approximation to the sum of n binomial samples,, probabilities
'£ at t Wp without, however, the coefficient term that would ordinarily
appear. This might suggest that the gamblers best strategy to maximize
the probability of exceeding nW8 would be to continually play the extreme
binomial distribution, or at least until he was within W of it and then
switch to a binomial which would just carry him oyer the limit if he won.
While this appears to be a rather good strategy, it is not quite optimal
h
v ■
14
as a study of small n values reveals. Determining the optimal strategy
appears to involve considerable combinatorial complexity.
The Probability of ever exceeding a Limit with a Negative Expectation
Suppose now that the conditional expectation of all wagers Is negative
and we are interested in a bound on the probability of ever (in an infinite
series of wagers} exceeding a certain (positive} value. If the expectation
were srero. then by well known results m the gambler's ruin problem the
only bound is unity, provided, for example,, the gambler can play a binomial
distribution. With a negative mean, however,, significant bound? can be
obtained as follows.
We consider the case again where the allowed distribution functions
must lae between two given distribution functions «t>j(x) and 4»2(x? but now
must have a mean m < 0. The maximum n(s) is obtained by the same
construction using ^ and <$>2„ but with a placement of the horizontal seg=
ment to give the mean m.
If 4>(0) is 1„ then 4>2(0) must have been 1, and no allowed bet whatever
will ever give a positive return. Thus clearly the probability of ever
exceeding any positive bound is *,ero. We will therefore assume that
<K0} < 1. This assumption also excludes <Kx} being a unit step„ since the
step would have to occur at the negative number m0 making 4»(0) equal I.
Under the assumption $(0) < 1, the \i(s) curve has the general form
shown in Fig. 2.
15
The curve Is convex downward; it passes through zero at s » © with
a negative slope m; it has a unique minimum at s = Sj (say),1; and passes
through ?.ero again at sq > sy These facts follow readily from the rela-
tions
= J d<Kx)
xesx d<|>(x}
vis) « J*
r*(0) = J
V(b) = f x2esx d<Kx^ jx^s) . vCs) gfaj - vis)
xd<Kx) ■ m
nts) * In v(a)
jt(0} * 0
»x1s) * ^
v(s)
fi6{0} * m
The numerator of u^s} is positive by using the Schwartz inequality
16
(the unit step which would give zero being excluded). Hence the u curve
is strictly convex downward. Also, for sufficiently large positive s,
v(s) will exceed 1 and tfs) will be positive, since <j>(0) < 1. Conse=
quently, the minimum (lats^Sj and the positive sero crossing at
s ~ «o both exist.
Suppose we are interested in a bound on the probability of ever
reaching or exceeding A with the sums u, - x., u? * x. 4- x
1 1 Z 1 2
^xn« . . . . We have
f ° • a % U "
n
Prfany u >A]< Pr[u >A]
n
From our above results Pr[u*A] « en^s^^ for the 8 such that
A * n»i»(*). The particular n for which this bound is largest may be
obtained by maximizing n[u(s)-sn'(s)] given A = nu'(s), or. in other
words, maximising A jj^i - sj . Since »»(•) > 0 this maximum exists
and occurs at a unique s found by differentiation, namely, the s for which
ji(s) = 0. This s is the so of Fig. 2. and the corresponding n we call
nQ. Thus sQ and nQ satisfy
non,(so) = A
In general, nQ will not be an integer, but the bound obtained for
evaluation at nQ and sq certainly is greater than that for any integer
points. Hence for any particular n.
17
Now consider the Sj where ^(Sj) = 0 (Fig. 2) and n. defined by
Again, in general, ^ wiU not be an integer. We let, however, [Hj] denote
the largest integer contained in nr
Returning to our inequality on the probability of un ever exceeding A
we may rewrite as follows
Pr[any ur 2* Aj «* £J Pr[u ^A]
n
E Pr[u *A] + £ Prfu »A]
n-1 [njHl
00
[njj+l
< n,e
-n s u
o x o, e
1 - e 1
<n,e ° <>— o' + _g_
1 - e
<s e
a1 +
- e
-nj^Sj)
1 - e
1
1
n. + ; — r
Pr[any un 3* A] == e
-s A
1 - e
1
s A
1 - e 1
1 rt
This is our desired bound. It is essentially exponentially decreasing
in A. in fact more refined analysis can be given to show that the bounded
term can be replaced by a more involved expression which does not increase
with A.
References
Chernoff, H. U952). A Measure of Asymptotic Efficiency for Tests of a
Hypothesis Based on the Sum of Observations. Ann. Math. Stat. 23„
493-507.
19
7*
Some Ussful Inequalities for Biatribvtion, rjaptitsis
In this section a number or inequalities trill be riven r»Msh ere
useful in estimating the "tails'1 of distribution functions or' ether
related statistics*,
Binomial Inaqra litis? s : lat
1 1
(I)
Then
GBEp-Cj^+j^) 5(^)50 . (2)
anc.
Trhere t& ••* 1 » A, and neither /. nor ja is sero0 (I.'ote that if either is
zero, G is undefined*} SincMer inequalities hold for the ftcsras of a
binomial distributions (^»)pAKqliil, and asay be obtained by multiplying
the above inequalities by p'^'q^o They nay also be generalised to the
multinomial coefficient:
1 1
G - • ( o i
■
Gi e*-» ~ s Gi «- (-1 12^> s tt^sti * Gi <w
T'here s is the number of comoone.nts.5~ .\. * 1 and nana of the \. vanishes „
a i
The "tail" of s binomial distribution may be -estimated by the f ollosrin
formulas!
k»An
Akn-k, l ., , , 1
(k)r q £ 7:~~"~7 Gt (JjP-c-od»«i - => P+£
(0)
it (g)pkq-",£s feX (^)l Voided x.p . (6)
The first of these gives a closer estimate of the tail but is somexvhat
more complex. The inequality (6) (Chernoff ) is often convenient because
of its simplicity. Loser bounds for tails nay be taker, to be merely loner
bounds for the first term as in the lower inequalities of (2) or (4) •>
We shall n&% prove the inequalities (1) and (2). The Stirling
approximation for nl is as follows*
It is known that if no terms of the series are taken, nl is underestimated,
if only the ^ term is taken, then nl is overestimated, and so on. Ke
'fish to overestimate ni/( to) J(nn) J . This will be done if the numerator
is overestimated and the denominator underestimated. Thus re may write
fo-%1/2 n + 1/2 -n 1
nl * 12n
or
tf 1 1 , l i i i t
(Xn)i(n«)i " y^?' 7%^ Cl2^° 12*n~ 22pn+ 360( AnP + ^^3)
We wish to show that the exp term is less -than or equal to one, or, which
is the same thing, that its argument is less than or equal to zero.
One or the other of \9ii is the greater. From symmetry, we may assume
without loss of generality that it is X, that is, X > ji. Then
^I77T5 - "," A and since is a positive integer, — T < -i- c
360(An) a360(lm)"» 36o({JIl)3 36q^
Further, jg^— j^Jf S 0, since Xn < n. Using these, we have
ire * (A - s& - <jfc - rifc> * °
This proves the upper bound (2). The lower bound is found similarly by
underestimating the numerator and overestimating the denominator. No
terms of the series are used for nj and the ^Lj and ~— for the denomi-
Page 3a
nator term,, rhis gives directly
The other lower bound with -i/tt/2 in place of the exp term is obtainsd
by noting first that unless both Xn and un are le3S than or equal to two,
the argument of the exponential (^>ji ^i^jji^ *s "Less than +35^ * ^ 0
Now exp - ^ > -\fn/2s and it is also readily verified that for the four
cases where both An and on do not exceed \mo, namely (2,2), (2,1), (1,2)
and (1,1), that the result is true» The worst case is (1,1) which just
gives for equality 0 Hence the result is true in general.
The upper and lo^er bounds (h) for the m.\ltinosial are found in
exactly the same way as for the binomial.
The tail inequality for the binomial is fori_\d by overestimating the
tail using an infinite geometric series „ This process is familiar
(see, for example, Feller) with g replaced by the t\nomial coefficient,.
The inequality (6) is a special case of Chernof f »s i^quality which will
be discussed later more generally,,
A Lower Bound on the Tall of a Distribution
Let n»C«5 be the logarithm of the moment-generating function of a
distribution F(x), and assume u.(s) exists in an interval with s - 0 in
its interior. Then
dF(x) * e^(s) e"sx dG(x) (!)
where G(x) is the distribution of the tilted random variable obtained from
F(x) by the e A multiplying operation and normalisation. G(x) has its msan
at n(s) and its variance IsT * ^'(s).
By the Chebycheff inequality
G(u'(s) +^/p"'lsT. ) - G(u'(s) -ok/^TbJ - 0 )>1 ~ 1
for any positive C\0 Mow integrate equation (1) from \i' (s) ~ o</p.! ' is ) - 0
to u,!(s) +o</ii' s (s)0 This gives ✓
F(u° ^/^tj - F(u< ^ /prr _ 0) * Je :* dG(x)
#'
This then, is a lower bound on the probability for the F distribution in a
small interval in terms of the logarithm of the moment generating function.
If F is the convolution of n identical and independent distributions,
each with n(s) for its log moment generating function, then that for F itself
is equal to nu.(s). The interval in question is then 2*\/n^ 1 ' while the center
position (for a fixed s) grows as nji*(s).
If we integrate (1) from -00 to u.» +f//[TrT and assume we obtain an
underbound on the tail of the distribution F in the negative direction ,> This
gives H» +«f/vT7
F(m» VAPT)^ J e"**3 dG(x)
—00 . ,
^ - 8n» + s&ftP~r dG(x)
7 *
If ,? it »p.i convolution cf rj J.dacticfi.1 &istrd£. \i-io:-? each with ifrC'c-> as ths.
Ic£3rixbx oi' its coassnrt gcr«ort.t3Jts,t functior..
Thus the a-_"~uaEr«t of T spprosshes r. sy :.;t . :ticn~. by for iarne n the arga~
rant Sty' app&ariag in th-3 Chernof; upper bound. Likewise the exponent on
tne richi (;^nd the coefficient 1-3 can also be included as a term in the
• exponent) - -preaches as^ncpl-tieslly the expcr^nt ir the Chernof f upper
rx>urc,= j
-Iocs-: fr^uaHties Slay also be extended to tno cr.se where F is a
=onvc!!uticr. si r.ot necessarily identical distribute ons with functions
Co) (i - 1, i t - * ft). Then for F itself we have u - V'jj.^ , vi« -2.^
--d P ' 1 » and those may be substituted in (2) and (3). It is also
evident that these same inequalities for &"2Q give a lower bound on the
tail in the positive direction, that is, 1 - F(|i' f - 0).
lover Bounce on i!ultinonr .»?. Tails and Tei
•Suppoee we have a discrete distribution: a random variable can assume
values' ^ <£.v2<~ - -<(vt with probabiiit -s p.,, p?, - - »» p . We wish
to establish a lower bound on the size of term that can be found in a email
interval when this distribution is convolved with itself n times (that is,
junns in the distribution of the sum of n independent variables, each with
the given distribution) „ We first show the existf-n- oe of a term having a
certain sise near the mean cf the convolved distribution. To do this, the
following lemma is first proved.,
■v- ~~ —
?arts cf these results were obtained in collaboration with Peter Slias.
LegBR.i For any given n, we can find integers i^, xi^t , such
that
K - ^ 1 C« )
2ni * n (2)
nZPi v. ^^n± v± ^ n v± + £l (3)
where A = ^in v , .. - v . „
A i + 1 i
Proof j We first find a set of integers at. which satisfy all the con-
ditions except vi ■< n p.. vi * A , and will than derive from these the
n^. Choose to be the first integer greater than p^n. Set m^ - p^n - 6^.
Kext, choose m-. as the greatest integer less th~n p^n and set - pnn - 5n ..
If 6^ - 5^ 0, take another m from the low end (i, e„, m^), the largest
integer less than p^n, and then calculate 6t + + 6^ where 52 - - p^n.
If this is positive, proceed with p , etc., until the accumulated sum of 6fcs
first becomes non~pos it ive. When this ••'CcursJ. terms are taken from the top
end of the v range (Pt-1> pt-2> etc*) the accumulated sum of 6's
goes positive.
This process is continued, alternating from one end to the other as
the sum of the 6's changes sign, and eventually will end with some index
k, having the property that all ^ for i<k satisfy n. - pin - 0
while for all with i£ k, we have - p^n = 6^ 0. At each stage of
the operation, the total accumulated discrepancy satisf iesj^o^^ J £ | . This
is true at the beginning, and arguing inductively at each stage we add a 6
of absolute value less than or equal to one to an accumulated of abso-
lute value less than or equal to one and of opposite sign. This leads to
the next accumulated sum also being less than or equal to one in absolute
value. Hence, when the last assignment of is to be made,
If we let - n - a^, then we satisfy^m, - n and also hr
i / k
\ • n -^Z, (np + 6 )
. i / k 1 1
- n -(n - np ) 6
K i / k 1
3 '*k + 9 H^1
Thus, | \|*^\ -lso. •
Nor since > 5 • C. k have
1 1
h . n
-X 6 - r: 6
1 1 n *■ 1 1
where h is the index cf the largest nejwtive &i , (eithar 5,. c* -
Multiplying each side by vh and using the monotc. s ordering of t; e
obtain
%l>: - .h . t t
- ^ 6i -f-5i -h ■ £ 6i 'b**^ 5i *s
J- J. h*l h + i
Hence, using the end expressions in the above inequalities r
and therefore
t .
1 '
npi vi + >:5i vi
t
New starting with the m± we can construct a set of which satisfy
all the conditions of the lemma. Note first that all the 6 for i^ h
are positive and for h are negative. If we replace one of the lower
'u±> say ma(a£h), by the next larger integer + 1 and simultaneously
an n^b/h) by the next lower Integer ^ - 1, we retain the properties that
the errors in approximation satisfy J 6± J £ 1 and that their sun be T.ero
(or equivalency, 2k - *0« However, this reduces the value of!>m v
x * — i i
by an amount vb - v&. Starting with the set cf m± just derived, we shall
show how by interchanges of this type it is possible to go down f :• om the
value'21mi Vl by steps none of which is lar-er than A, and eventually arrive
at a sum less than or equal to n^P-j^ t±. It will follow that in this
sequence of operations there is a stage at which the third condition of
the lenma obtains.
The series of steps is constructed as follows . Perform the inter-
change operation on (the last negative n^) and + r Since
Vh+1 ~ vh^ ^» the chan6e the sum due to this change is less than or
equal to A. Now in place of this interchange consider that of h against
h + 2, or that of h - 1 against h + 1„ The additional change in these
cases over that just considered is clearly less than or equal to A, being
indeed vn + 2 ~ vh + 1 or Th ~ vh - r ^ 118x1 stage would involve adding
to one end or the other of the interval already taken. This again changes
the sum from that previously obtained by not more than.d. This process
is continued until the ends of the range are reached, that is, v and v
t 1
are used in the interchange <, These are nr/n left in the changed state and
the process is started again with and + . Working outward from
these eventually the nuabers m^ and mt _ are used. These are then left
in the changed state (that is at rq^ * X and ffi^ _ - 1) and again the
.'.rocess started at and ac.^ + „ This procedure is continued until the
permanently changed m's from one end or the other reach e, or a, . so
n n + 1
that .Vzrthsr steps of this type are not possible. The set of changed
m^'s, si;' ie J | then existing have essentially the reverse property of the
original set ; the corresponding s/ (that is - p^n) satisfy 6^ 0
for Il£h: foj a certain h' „ Hence s using essentially the same argument
we used in prov.'ng (U), we can show that
t
i
Thus this series of steps has at souc stage given a set of integers
such that 0^"*£J^ 6 -Cxd f namely, the integers at the stage just before
this sua goes negative, For these we have, equivalently,
n ■^-pi vi ^ . y— n v. ^ n -2-^ + A °
'chis completes the proof of ti<e lemma.
Returning now to the original problem, consider the term in the nth
convolved distribution where thi value is taken times (i - 1, 2, — t),
the n^ being those of the lemma. In the multinomial distribution this gives
rise to a term of
/n\ n,
0 ? ^
n
This inequality is an application of the general inequality proved previously
■
for mult; nomials-, We now wish to simplify this making use of the fact that
the n^ are close to p^nj j * j n^ - p^n Consider the last terms in
the exponential?
\ log P. -27. log 5 , -S^ log (1 . \}
6
i
* ppi (since log (1 +-x)&
Pin (since « o)
- -i5T i.
The first exponential term can be estimated as follows.
^TFnT 12n^pJ "H7
We now assure that, for each i, p n^l (in other wor^s, that n^1 ).
. min
f0U°,'S ^ e"Ch "i? 1 K « - « I -a „. „ „ integer)
Srtrt henna I
n. - 6
p 5L - _i i
i n± n±
ni
- 1 + i_
^2
Thus
Finally the coefficient in (5) can be underbounded as follows.
ni 6i -1/2
-1/2
7/ ( o^- exp (-?2^i
Collecting these various terms we have the following result j
Theorem: The sviz of n independent random variables, each w.'.th the
sais discre^e distrib r:ion, probability p. of value v^^ (i * 1, 2, - - t)
O^j^T v^ p ) has a term in the closed interval frca^pj vi -° ^ P v^ + &
■ ESac(^ . - v Jand the terra has a value at ieastr. ^ .. e 3n — p . f
^ A p^TTp i
provided n ^? p~
ain
This result cay be generalized to give a .era of such a dis -ributicn
anywl^vY, in the possible range. This is dene by writing the dis :rib vtion
in terras of the tilted distribution} the sua of independent random variables
v . s — v^s
Vth probabilitLib ^(s) - p^ 1 / ^ p^e As we have seen pre iously,
the distribution function of the original sum, F (x) , is related ,o that of
the tilted distribution function, r'n'.t), by the equation
dFn(x) - e^(s) e"Sx dGn(x)
n n '
The Gn distribution has a term in the internal A - n'-(s) to A + £, since
jj.'(s) is the mean and the previous result applie., „ This gives a ,erm in the
Fr distribution, to the anoxint ctated in the following.
Jheor^J ~he sun of n independent random variables, each with the
same discrete distri vj'ion, probability pt of value vi, (v^ < vivi^
(i - 1, 2, - - t), has a term in the closed interval from A to A ♦ ^1
where A » mx (v, - v. ) and n v . ^A<. nv , The term will lave a '
i i*l i nun ^> max
value at least
v.s v,s
where q^s) - p^e p^e 1 and s is chosen to make A -O^Cs) vi>
and provided n^q^ (s). The last term is the Chernoff Vund with
^ v.s min
H(s) - log<>p e 1 , ^'(s) - A0
A Coafcinatorial Theor-en
Theorem; Suppose we have a set of objects S.,, S^, oc^S and a nusier of
nuriBrically valued proportiee (functions) for the objects ?ia Pg,.*?^
These are aon=negative P. (S.) £ 0 end we laiosr the averages of these
properties over the objects:
Then there assists an object £^ for vniioh
P4(S ) <di. i - lf 2, d
More generally given any set of K. > 0 satisfying
i«i i
then there exists en object
Pi(S?) < i - 1, 2, BOO„ d
Proof; The second part implies the first by taking %. - d0 To prove
the second part let l!± be the cuaber of objects for which P^CS) > K^a^o
New A± > i H± K± A± (sii»e all S »s have Pi values > 0) .
a
Hence R. < —
a Ki
The total nucber of objects U violating any of the conditions is less than
or equal to the sum of the individual N.
l
M < n ^~ f" - n ^sing ^ i. < 1
Hence there is at least one object not violating any of the conditions <>
Sgn^s Results cr> Determinants
The root of a determinant equation.,
Leans: Given f .{») * 1,2, «.) continuous functions of w in
the range a < X ■< L and in this range £.. ,{ta) > 0.
> fij^fc) > °s '^(a) < ^» f^(b) > d, "Chen there exists W, a < V, < b
and a set of X. > Q,TX. * 1, such that
i — i
^cof j Consider the d dimensional region P. whose points are (JL, Xd,. W)
V7here X± > 0>jT K± * 1, a< ?: < bo This is a topological imace of a sphere
and its interior c For a fixed W in the range from a to b . consider the
continuous rapping
ij 1 ^
v * w + 1 f .(iv)x.
1 fj id .1
ix a < Va < b
a if f , < a
l^bif ?1> b
Note that the denominator for Y^ does not vanish because of our assumption
that ^ ^-(6°) > 0 and hence the Y are rell defined 0 Also the Y. are
(X^tf) in R continuously into points (Y^V) in Ro Consequently, by the
Erouwer fixed point theorem there exists a point (XJRf) which is napped
into itself, that is, a point for which (W) - X . ^ (W) s
Vi - V„ The value of W for the fixpoint clearly is not a or b since these
points are moved upward or downward by our assumptions „ Hence for the
fixpoint we have Iff « W + 1 - T" f (W)X, or T" f . .(W)X. « 1„ It follows
ij iH
that for the fixpoint
Let the elements a.. . of a ratrix be non-negative e Suppose there is
an eigen vector A all of whose components ere positive, a. > 0, &v6 the
1 * 2, '
corresponding characteristic value is K . fie trill show that for anv
c *
other characteristic value ^ we have |A_J £ \ . Let B. be a characteristic
vector for ^ where r;e adjust the length of this vector as follows.,
Choose its length in such a way that A. - jB j S 0 for all i and the
equality holds for at least one i, say i « h, so that At - JB j 0 It is
clear that this can be done since with zero length all components of B
are less than those of A and-' increasing continuously, eventually a first
one of the jB.. j reaches its corresponding A^. Me now have
S>i£ij * V5 (1)
^ Biai. - V3; (2)
^lBil£i^ \\\ (3)
Subtracting these equations for j * h
f {V iBil>aih^ \A,~ N jBh| (h)
All terms in the sua at the left are non-negative and also A^ is definitely'
positive o It follows that A - jJ^j > 00
The derivative of the eigenvalue of a matrix.,
Suppose we have the square matrix (a^s)) where the elements are
different iable functions of a parameter s. Let V » V(s) be an eigenvalue
with corresponding eigen vector A^ - Ai(s) and eigen vector B^ - B.(s)
for the transposed matrix. Thus ^
iaij(s)-^s^ij] -° a)
^Vij"^ (2)
^Vij"VBi (3)
Theorem:
".Bj
V(s) - ^-
To prove this, differentiate (2 ) with respect to s:
s4aij+^Aialj ■ ^Vvaj •
Nor/ multiply by and sum on j
S" A V ,B ♦ S~ A. a ' E - V' J" A .B . * V y J.B . „
ij J J i3 ^ J J T 0 J
Using (3) in the first term cancels the last term on the right, giving
the desired result
1 -7
Upper and Lower Bounds for Powers of a Matrix rr: tn iMon-nsgative Lleuanit:
th
We frequently have to deal with the r. poorer of ^ matrix -.Those ele-
ments are fL JJf 0o We denote the ij element of this n'J ' power by ; ' *
lj -j
We are concerned here frith the case where the corresponding graph has tbfc
property that it is possible to go from any node i to any other by t
finite sequence |3. S . r-here all the ? s Ifc this series arc
positive- This means that the crap:, consist? of one ergodic or periodic
set in the usual ISarkoff analysis. The non-negative conditio- en the f>
insures the existence of a real eigenvalue v Khicfc is a solution cf the
c
determinant equation \\. . - vo. ; * 0. '. urbher* this v dominates in
absolute value any other eigenvalue v , that is., y jvjo
Corresponding to root v there will exist right and left eigenv —
for the matrix
3^ - vo A.
l
Bo - vo Ei &
The conditions 34 „>"0 imply that all the A. be the same sign (or vanish)
and all the Bi be the same sign (or vanish) o In both cases vre take the-
to be positive (multiply by - 1 if necessary) . In the case satisfying the
graphical condition it is easily seen that all Ai and all are then actually
positive (none vanish) .
Theorem; Under the conditions above, i. e. 0 and any state acces-
sible from any other through a finite sequence of non-vanishing transitions,
fn
the element of Jl ^ /j
where t is the smallest 'nou~ vaniski i:g ) f^.., and d is an integer such that
there is a oath fron: air- stats i to any state j rrith not oc-re than d steps
(d - I irtirmsciace states) » Furtnsrcore , there will exist and n
such that
i rj // 0 ' O
provided cither (1) lor some n^r ~\ _.' ^ 0 for all 1, j or (2) the state
aiagrss iu-.i no recurrent subsets (the greatest eoJSfflon divisor of closed
path lengths is 1}»
Proof ; The first inequality is proved easily by induction on n,
For n *■ C,
since for i ;K 3» the right menfoer is positive and 6. . - 0, trhile for i
1 .j
Now supposing the inequality to hold for- n we prove it for n + le
(n)
5 - 1 and the right reenter is one.
rupposing the inequalil
, (»*> .To 3 <J
<^ p. B*1 B v n
^ J so
-3.
» B~J" v n v 3.
J c CI
"his is the corresponding inequality for n •> i3 concluding the proof.
The second inequality, that 3,/ E. < (vVB . )d is shown as follows.
From (1) , let some '. , be positive then
lp
The Nunfoer of Sequences of a Given length
Suppose a nunber of letters are available whose lengths (or durations)
are a^, a2, .„„, ag and we wish a bound on the marker !!(£) sequences of
total length /. Here it is assumed that any sequence of letters is
allowed, }](£) satisfies the difference equation
Ul£) - N(/-ai) +K(/-»a2) + ... + h(/- a ) 0
as T7e see by noting that each sequence of length £ mist end in one or
another of the available letters « Furthermore, the boundary conditions
say be taken to be K(/) « 0 for ! < 0 and K(0) - 1. Associated with the
difference equation is the folic uing characteristic equation:
Since all the a± are positive cud real, the right-hand member is a strictly
monotone decreasing function of X and varies from co to 0 when X goes
from 0 to co „ Consequently, tte characteristic equation has a unique
positive real root W0
Theorems n(g) < ^ 9
To prove this, note first that satisfies the difference equation
since this results on multiplying the characteristic equation (vrith X
replaced by W) by „ With regard to the boundary conditions,
W° « 1 « n(0) and F^C- K(f) when /< 0C Let a be the scaliest of e^,
a2, „.., ag„ Then it is possible to proceed by a kind of induction of"
steps of i(each of length a) to show that the dominance of Xi£ over N(£)
continues for all £. In fact, suppose that for jg*£ we have K(/) < „
Then f or £ in the range X < £^ +a
N(/) - N(/- ax) a2) ♦ ... + N(/-a )
Sinse the inequality is true for/s 0, it follows that is is true for
all/. '
A more general problem cf the same sort relates to sequences which
are subject to a finite state set of constraints. Thus, suppose there are
d states and that in state i, letters of lengths / are permitted,
2p
leading to state j0 The index a ranges over the different letters going
from state i to state j and j ranges over the different states v:hich can
folic*? state i0 Now let ?!..(/) be the number of sequences which are*
possible and which start in state i, end in state j and are of length /0
These quantities are readily seen to satisfy the difference equations
m 0 £ < 0
The corresponding characteristic equations are
A. - A W ai3
oTi
Let V/ be the largest real root (there is a positive real root by a
previous result based on the fix point theorem) of the determinant
equation:
I Y iT^id r 6,,
and 1st A4 be a corresponding (positive) solution of (2), We will assume
the graph of the constraints is fully connected so it is possible to go
from any state to any other. Then all the A^ are positive (none vanish) „
We will now show that the number of sequences of length £ starting
in state i and ending in j, N^C^), is bounded by
V>«
This is certainly true for £< 0 and also far / - 0 since then both sides
are one if i " jt and otherwise the left side is zero with the right
positive o We now proceed by the inductive type process as before, assuming
the inequality out to some £^ and then show it follows for / out to
plus the minimum / . . e
■yO - £ «u <,j>
1 US
(continued next page)
3p
Thus the inductive step carries the inequality up to £-mJ + ain / .
and hence it is true far all £0 ^
An Alternative proof that \]<J) < li^
Consider the case of a sequence of letters of different lengths
al* a2* po° ag no constraints c We wish to prove that EJ(/) < 9
where W satisfies ^ W^A « 1. Assume, in contradict ion, that for soas
£, N(i) > W . Then, since M(0) < ¥°, there is a greatest lower bound of
/5s,say £% for which the theorem fails . In the interval £*< £ < £*+ J a
there must be an /, say for which the theorem fails (a^ is the smallest
a±)c Sybdivi.de the sequences of length £^ into subsets according to the
first letter . let the fractional number in the subset beginning with the
letter i be f. (i - 1, 2, g) . Choose the subset for which aT1 f?1
is a minimum,, In a sense, this means the subset which conveys ohe least
information, log f° , per unit time in its first letter , The minimum
value of a~ log f J1 aaong the different subsets is less than or equal to
log WB To see this, suppose, in contradiction, that for all i, a*:1 log fT1 > log W
Then f . < TTH and, summing on i, 1 - f . < £ - 1, a contradiction.
Hence the subset chosen will have a^1 log f"1^ log I, or f . ■> nT*i. If
we delete tho first letter from all sequences in this subset, we are left
with a set of more than ifi- ~a* sequences of length £^ -a^ Thus
N( A > * 1 "ai° S±t»e/1 -ai < /* this contradicts the assumption
that / was the greatest lower bound of /'s for which the theorem fails .
Hence the theorem is true for all /„
Page lc
Characteristic for a Language rcith Inclopsndsnt Lot tors
Suppose vfe have a stochastic process ^nerating a language ccn^
sisting of a sequence of independent letters 0 These letters are all chosen
with the probabilities p± for letter i, i = 1. 2, g0 We consider
sequences of n such letters, that is. words of length n in the language.
Suppose that all such words are arranged in order' of decreasing probability
from the most probable one, consisting of a sequence of a most probable
letters, down to the sequence of n least probable letters,, The logarithm
of the probability of any particular nerd is (because of the independence
of letters) the sun of the logarithms of the probabilities of the indi-
vidual letters o Thus, the logarithm of the probability of a cord is
a random variable which is the sum of n independent random variables each
with the same distribution function. We may, therefore, apply previous
results concerning the tails of such a distribution to estimate the
probability in our monotone sequence of all words beyond a certain point «
The distribution of log p"1 for a single letter will bjivc s. moinent
generating function
1
1- s
i
Hence
y.(s) » log S
1 x
KT~ 1 - s , =1
ii\s) - 1 jr-g (i)
Our upper bound on the tail of a distribution then shows that the total
probability PT of all sequences whose individual probability P satisfies
JTp^iogp-1
| log P <p.»(s) - (2)
i 1
Pegs £c
is bounded by
H l0S PT * -s^s) - log £ p* " s * 1 '
* y P.
V 1 -s , s
2_ p< log p.
1 (3)
This last expression as well as (1), can be written more compactly in terms
of a new set of probabilities ^(s)
Ihe relations (2) and (3) no? become, after some manipulation,
10£ P S I^j log p"1 £ T q^s) log 3y (),)
2. J i i
This is one of the results we desire, an overbound on the tail of the
distribution of probability for .sequences .
We nor? desire a similar bound on the number of sequences whose
probability is greater than P. To this end, consider constructing all
sequences of length n giving each letter probability i (instead of the
probabilities p± they actually have). We again consider the distribution
of the sum of the logarithms of the probabilities (using the original
Pi ^lues) for the letters in a word. Note that the sequences arranged
in monotone order are in the same order as previously. Under these new
conditions the moment generating function V-^s) and its logarithm ^(s)
are given by
«
M^s) » log 2~P?S - log g
i *
The total probability P2 of all sequences in the tail of the distribution
beyond the sequences whose individual probability P satisfies
will be bounded by
x t ^ pi lo£ pi
n loe p2 5 ^(s)- b^Cs) . log ^Tp"3 + 1 os log g .
Tve note first that in this modified probability system (each letter with
probability ^) all sequences have probability ~- and c onsequently the
number of sequences N2 in the tail whose total probability is Pg is
precisely P2gn„ Hence the number Kg in the tail is bounded by
^ log N2 - ~ log P2gn - i log P2 * log g
c- ~s s
< 2. Pi loE Pi
~ log ]>" p°S +
1 rPr
In order to compare this result with the preceding one (1;). we must
identify the points at which the tails of the distributions are cut off „
This can be done by equating the probabilities P of the individual
sequences at the cutoff point. Thus, using (1) and (5) and writing ^
in the latter in place of 6 we have
i i
This is obviously satisfied by l~s - -e-, and since n"(s) > 0 the left
term is a strictly monotone function of s and therefore this solution is
unique o
The number of sequences now becomes, in terms of the s involved
in (1) and (U),
ZP^8logPS-
i log n2 < io£ Xp^s ♦ -Vr=
1 4- ?!
Rige i-c
Again using the 0^(5) to simplify
| log K2 < T qi(s) log q^r1 «. (6)
Both the bounds (U) and (6) are also the limiting values approached
by - log PT and - log Ng as n->ooc This follows from remarks concerning
the tails of distributions made in an earlier section „ Thus the relia-
bility curve of a source of the type we are discussing here with inde~
pendent letters may be written in parametric form as follows :
(7)
e(«3 q^s) losqrly - (a/- s^1)
R(s) * qi(s) ioe qi(s)"1 = ^y- ($-0/^ (6)
I-s
^here q.(s) * -^r^- . (9)
i
The parameter s in these equations is related to the slope of the
reliability curve,, In fact, we note that
dR ds ' ds */>,/, x "/ v s 1 - s
11 (SM (l-s)ji (s) -Ji (s)
Thus, as s increases from 0 to 1, the slope increases monotonically
from 0 to oo. It is interesting that at s • 1 the formulas (7), (8)
become
E
(1) " \ Z log Pi f log d
R(l) - log d
The Probability of Error
A problem of importance in ii&crmatioa theory is that or studying
the behavior of signaling codes that say be used in encoding an infor-
mation source for noisy channel and, in particular, the probability of
error for the optimal codec This paper is concerned r:ith estimating
this probability of error under fairly gonarai conditions „
V.;e niil find that, to a large erfceat, the prdblea can be divided into
two parts. First, there is a problem relating to the information source
only (not involving the channel) which involves estimating the probability
of error when the source is encoded into a simple standard noiseless channel
The study of this question leads to a certain function which we call the
reliability characteristic for the source and which determines, in a
certain asymptotic sense when the code blocks arc. Ions, how rapidly the
probability of error approaches zero. Second., there is a problem relating
to the channel only,, This leads to a function describing, in a sense,
the coding behavior of the channel with regard to probability of error
when the code blocks are long,, Our final and most basic results show
how the two functions may be combined to give optimal behavior (or bounds
on optimal behavior) when the source is encoded into the channel „
We will first clarify our terminology, since various writers have used
sons of the terms involved with quite different meanings „ For the most
part, we will restrict ourselves to a finite, discrete, memoryless channel,,
Sucl/a channel is specified by a transition probability matrix |jp±(J)|| «
Here pi(j) is the probability that if input synbol i is used, the output
will be j and we have
Matrices satisfying the conditions that all elements are nonnegative and
the row suns are unity occur often in probability and are called stochastic
matrices 0
The input symbols to the channel will be called the input letters ,
the set of these the input alphabet., The output symbols of the channel
will be called the output letters and the set of these the output alphabet.
A channel -is often conveniently represented by 5 line diagram of the type
shxai in Fig, lc
The ciianncl beir.£ memoryloss eeans that successive operations are
independent- If the ir.put letters i end j are used, the prdbabiiitv of
output letters k ar.d C, rill be p^kjp.. (/J>» * sequence of input letters
will be called an input word, a sequence of output letters an output
word, A collection of M input words all of length n Will be called a
block code of length n, R » 3/n log U will be called the input rate for
this codec Unless otheri-ise specified, a code v;ill mean such a block
code c.
A detection system for a cede is a method of interpreting output
words as input words, that is, an association or mapping of one of the
input words of the code for every output word of length n. The pro-
bability of error for a particular input word is the probability, if this
input is used, that it will be interpreted incorrectly c It is, therefore,
the probability of that input word being received as an output word which
is not detected as the input word. The probability of error for a code
is the average probability of error for all input words in the codec
An optimal code cf length n is one which minimizes this probability
of error (when using its best detection system). These input words iu, u>, .
uM need not all be different.
Cur main problem is to estimate for a general channel upper and lower
bounds on the probability of error- for an optimal code as a function of
the length of the code n and the rate of transmission R„ The ideal solution
would, be to find a simple explicit formula for the probability of error
in an arbitrary channel as a function of the rate of transmission R and
the length of the code words n. This is probably too much to hope for
in view of the diophantine complexities of optimal codes. Barring such
a complete solution, one may still hope for upper and lower bounds on
r\ and perhaps results relating to its asymptotic behavior when n is
large, idost of the present paper is devoted to this type of result.
In studying the asymptotic behavior, it will appear that Pe, for a
fixed rate R and a given channel, varies approximately exponentially with
n. For this reason it is convenient to introduce a new term. If a device
cr a system has a probability P of making an error, we shall call -log F
?c.rc 3
the reliability of the device or system. V;e have .lust said in effect
that for large n the reliability for optfcal codes varies essentially
linearly r/ith n, that is. as E(R) . n9 whore R is the rate for the coda 0
More precisely, we define E (R ) as follows :
E(R) » Lin sup-- log P
n e opt
n-s-co
We will call E(R) the reliability characteristic of the channel and
attempt to evaluate it, or where we cannot do this, at least place upper
and lover bounds on it„
The writer feels that the quantity we have defined as reliability
wiU, in many cases, turn out to be the most appropriate way of measuring
s probability of error, In. previous work by von Iteumann en unreliable
neuron-type elements and by E. F, Moore and the writer on unreliable
relays, the quantity • 3 eg P entered significantly and was the mere
natural way to describe some of the results c .In both these cases the
reliability varied rather s imply with the redundancy of the error=correc -
ting systems o It is a little like measuring gain on a db scale or ion
concentration on a pH scale „ While actually little more than a change
in scale, the use of these units of reliability in the codii^case
threes the results into a much more natural and illuminating perspective „
If we have two given channels, it is possible to form a single
channel from them in tero natural ways which we call the sum and product
of the two channels. The sum of too channels is the channel formed by
using inputs from either of the two given channels with the same transi-
tion probabilities to the set of output letters consisting of the logical
sum of the two output alphabets. Thus the sum channel is defined by a
transition matrix formed by placing the matrix of one channel below and
to the right of that for the other channel and filling the remaining two
rectangles with zeros- If
Pi(5)|| and IJbJO^
are the individual
matrices, the sum lias the following matrix j
P2(l) • • • P1(r) 0 . • . 0
Pt(D . . . pt(r) 0 • • . 0
0 • ' • Q p^l) . . . pj(r')
Pt*(D • • • Pt»(r )
lags k
The product of two channels is the channel whose input alphabet
consists of all ordered pairs (i.i') where i is a letter from the first
channel alphabet and i froa the acconel, whose output alphabet is the
similar set of ordered pairs of letters from the tsrc individual output
alphabets and whose transition probability from (i,i') to is
Fig. 2
Page ig
Zero Error Codes and the Zero Error Capacity C
In a discrete channel we will say that tr/o input letters are adjacent
if there is an output letter which can bs caused by either of these two.
Thus, i and j are adjacent if there exists a t such that both p±(t) and
Pj(t) do not vanish o In Figc 1, a and c are adjacent, while a and d are not.
If all input letters are adjacent to each other, any code with more
than one word has a probability of error greater than zero. In fact, the
probability of error satisfies
p ~ "j; n
o - m pmiri
where p^ is the smallest among the p±(j)., n is the length of the code
and U is the number of words in the code. To prove this, note that any
two words have a possible output word in common, namely the word consisting
of the sequence of common output letters when the two input words are
compared letter by letter „ Each of the two input words haB a probability
at least p^ of producing this common output word0 In using the code,
the two particular input words will each occur j-j of the time and will
cause the common output | p^ of the time . This output can be decoded
in only one way. Hence at least one of these situations leads to an error . •
This error, ~ is assigned to this code word, and from the remaining
K -1 code words another pair is chosen., A source of error to the amount
I pmin 18 *8»ig*»d in similar fashion to one of these, and this is a
disjoint event 0 Continuing in this manner, we obtain a total cf pn
probability of error. * m "
It follows that for any rate R greater than zero, (ice„ U > 2)
4 log Pe<logp^n+ | log 2
' E ~ loS Pmln
If it is not true that tho input letters are all adjacent to each
other, it is possible to transmit at a positive rate with zero probability
of error. The least upper bound of all rates which can be achieved with
zero probability of error will be called the zero error capacity of the
channel and denoted byCo. If we let Mo(n) be the largest number of
words in a vcode of length n, no two of which are adjacent, then C is
1 o
the least upper bound of the numbers - log MQ(n) when n varies through
all positive integers . An interesting problem which has not been completely
Bage 2g
solved is that of evaluating C for an arbitrary channel 0
One night expect that Cq would be equal to log M0(l), that is, that
if we choose the largest possible set of non adjacent letters and form
all sequences of these of length n. then this would be the best error
free code of length n. This is not, in general, true, although it holds
in many cases, particularly when the number of input letters is small.
The first failure occurs with five input letters with the channel in Fig0 2„
In this channel, it is possible to choose at most two independent letters,
for example 0 and 2„ Using sequences of these, 00, 02, 20, and 22 we
obtain four words in a code of length twoe However, it is possible
to construct a code of length two with five members no two of which ere
adjacent as follows: 00, 12, 2h. 31, U3« It is readily verified that
no two of these are adjacent „ Thus, Cq for this channel is at least ~ log $0
No method has been found for determining Cq for the general discrete
channel, and this we propose as an important unsolved problem in coding
theory. We shall develop a number of results which enable one to determine
Cq in many special cases, for example, in all channels with five or less
inputs with the single exception of the channel of Fig„ 2 (or channels
equivalent in adjacency structure to it)„ We will also develop some
general inequalities enabling one to estimate CQ quite closely in most
cases a
It may be seen, in the first place, that the value of CQ depends
only on which input letters are adjacent to each other „ Let us define
an adjacency matrix for a channel, A, , as follows,
ij
Ai3
1 if input letter i is adjacent to j or if i = j
0 otherwise
Suppose two channels have the same adjacency matrix (possibly after
renumbering the input letters of one of them„) Then it is obvious that
a zero error code for one will be a zero error code for the other and,
hence, that the zero error capacity Cq for one will also apply to the other „
The adjacency structure contained in the adjacency matrix can also
be represented as a linear graph. Construct a graph with as many vertices
as there are input symbols, and connect two distinct vertices with a line
or branch of the graph if the corresponding input letters are adjacent.
Some examples are shown in Fig0 3, corresponding to the channels of Fig, 1 and 2,
Fage 3g
The are a: The zero error capacity Cq of a discrete memoryless channel is
bounded by the inequalities
-log
AiJ Vjscofi|Jtj)c
where C is the capacity of any channel with transition probabilities p^(j)
and having the adjacency matrix A . . o The upper bound is fairly obvious .
The aero error capacity is certainly less than or equal to the ordinary
capacity f cr any channel since the forcer requires codes vrith zero pro~
bability of error vhiSe the latter requires codas approaching zero pro*
bability of error. By minimizing the capacity through variation of the
P^j) we find the lowest upper bound available through this argucsnt.
Since the capacity is a continuous function of the p^(j) in the closed
region defined by p±(j) < 1, ^ p.,(j) - I, we may write min instead of
greatest lower bound 0
It is worth noting that it is only necessary to consider a particular
channel in performing this minimization, although there are an infinite
number with the same adjacency matrix. This one particular channel is
obtained as follows from the adjacency matrix, If A±k « 1 for a pair ik,
define an output letter j with p^j) and pk(j) both differing from zero.
Now if there are any three input letters, say i k 1, all adjacent to each
other, define an output letter, say m, with pi(m) pk(m) p1(m) all different
from zeroo In the graph this corresponds to a complete sub graph with
three vertices „ Next subsets of four lettors or complete subgraphs of
four vertices, say i k 1 m, are given an output letter, each being con-
nected to it, and so on. It is ev;
that any channel with the same
adjacency matrix differs from that just described only by variation in
the number of output symbols for some of the pairs, triplets, etc., of
adjacent input letters. If a channel has more than one output symbol for
an adjacent subset of input letters, then its capacity is reduced by
identifying these. If a channel contains no element, say for a triplet
i k 1 of adjacent input letters, this will occur as a special case of cur
canonical channel which has output letter m for this triplet when pi(m),
Pk(m) and p1(m) all vanish.
The lower bound of the theorem will now be proved. Vse use the
procedure of random codes based on probabilities for the letters P^,
these being chosen to minimize the quadratic farm A^F^p^ Construct
Page i;g
an ensemble of cedes each containing M words, each v;ord n letters long.
The words in a code are chosen by the following probability method. Each
letter of each word is chosen independently of all others and lias the value
i with probability P^o We now compute the probability in the ensemble
that any particular word is not adjacent to any other word in its code0
This probability that the first letter of one word is adjacent to the first
letter of a second word is <W ^P±Pjf since this sums the cases of
adjacency with coefficient 1 and those of non-adjacency with coefficient
Oo The probability that t^o words are adjacent in all letters, and there=
fore adjacent as words, is ( S AiiPiP^)n0 The probability of non-adja-
cency is therefore 1 -( |J A-yP^Pj)1*,, The probability that all LI -1
other words in a code are not adjacent to a given word is. since they are
- n-iH -1
1 A-yPiPj) , which is, by a well known
inequality, greater than l-(M-l){^j AijPiPj)*1* which in turn is greater
than 1 = M ( ^ ^ij?iPj)nc If we set M - (1 ^e)n(^ ^ij^i^j)^ ^
then have, by taking e small, a rate as close as desired to -log A. .P.I
Furthermore, once 6 is chosen, by taking n sufficiently large, wc can
insure that M( ^ Aijpipj)n is as sm9.11 as desired, say, less than 6.
The probability in the ensemble of codes of a particular word being
adjacent to any other in its own code is nc./ less tfcan 60 This implies
that there' are codes in the ensemble for wliich the ratio of the number of
such undesired words to the total number in the code is less than or equal
to 60 Far, if not, the ensemble average would be worse than 6, Select
such a code and delete from it the wards having thi3 property* We have
reduced our rate only by at most log(l -6 J""1, Since e and 6 were
both arbitrarily small, we obtain error-freo codes arbitrarily close to
the rats -log ^ax ^ A^P^Pj as stated in the theorem.,
Far simple channels it is usually more convenient to apply particular
tricks in trying to evaluate CQ instead of the bounds given In thiii theorem
which involve maximizing and minimizing processes „ The simplest loi.er
bound, as mentioned before, is obtained by merely finding the logarithm
of the maximum number of non=adjacent input letters 0
.
A useful device for establishing an upper bound depends upon the
adjacency graph for the input symbols » Suppose two vertices a and b of
this graph have the property that they are connected together and every
JRage 5g
vertex that a is connected to, b is also connected to (but not necessarily
conversely) . Then vertex b and all lines connected to b nay be eliminated
from the graph, leaving an adjacency graph for channels with the samp
zero error capacity „ This cay be proved by constructing from any error-
free code for channels with the first graph an error-free code with the
sane number of words for the second graph. This is done by replacing in
all words of the first code the letter for vertex b wherever it occurs
by the letter for vertex aQ This does not change adjacency relations
among words since a is adjacent to no points that were not already adjacent
to b0
Another device which is useful for finding upper bounds is that of
eliminating lines in the graph „ EiiixLnating one or more lines in a graph
can only increase or leave constant Co, since any zero-error code for the
old channel will be zero-error for the new channel. By careful choice
of one or more linos to eliminate, the graph may be reduced to one for
which Cq is readily evaluated, and if this Cq equals the lower bound
found by choosing a subset of non-adjacent letters, then this gives the
zero-err cr capacity 0
3, as well as others, may be described in more general
an adjacency-reducing mapping . Suppose that we
can find a mapping of letters into other letters, i-*a(i). with the pro-
perty that if i and j are not adjacent in the channel (or graph) then
a(i) and a(j) are not adjacent 0 If we have a zero-error code, then we
nay apply such a mapping letter by letter to the code and obtain a new
code which will also be of the zero-error type, since no adjacencies can
be produced by the mapping . If all of the letters i are mapped into a
subset of the letters, no two of which are adjacent, then it is easily-
seen that the zero-error capacity of the original channel is the logarithm
of the number of letters in this subset* For, in the first place, by
forming all sequences of these letters we obtain a zero=error code at this
rate. Secondly, any code in the channel can be mapped into a code using
only these letters and containing, therefore, only 2Con non-adjacent words 0
The capacities, or, more exactly, the equivalent numbers of input
symbols for all graphs up to five vertices are shown in Fig. U° These can
all be found .readily by the tricks mentioned above, excepting the channel
of Figc 2 mentioned previously, for which we know enly that the aero-error
Rigs 6g
capacity lies in the range ^ ioS 5 £ Cq < log | ,
All graphs T7ith six vertices have been examined and the capacities
of all of these can also be found by these devices., with the exception of
fouro These four can be given in terms of the capacity of Pig. 2, so that
this latter graph is essentially the only unsolved problem up to seven
vertices „ Graphs with S6ven vertices have not been completely examined
but at least one new situation arises, the analog of Fig0 2 with seven
input letters „
r
Page lh
; t!
Theorem; If two channels have zero=error capacities C end C , their
° r 0 i i! ~i
sum has a -ero=error capacity greater than or equal to log exp(C )+exp(C )
and their product a aero-error capacity greater than or equal to Co + Co„
If the graph of either of the two channels can be reduced to non-adjacent
points by the mapping method^ then these inequalities can be replaced
by equalities o
Brogf i It is clear that in the case of the product, the zero-error
a n
capacity is at least C + C , since we nay fcrm a product code from two
codes which are close to c' and C , If these codes are not of the same
o o
length, we use for the new code the least common multiple of the indi-
vidual lengths and form all sequences of the code words of each of the codes
up to this length , To prove equality in case one of the graphs, say that
for the first channel, can be mapped into non^adjacent points, suppose
we have a code for the product channel* The letters for the product code?
of course, are ordered pairs cf letters corresponding to the original
channel , Replace the first letter in each pair in all code words by the
letter corresponding to reduction by the mapping method. This reduces
or preserves adjacency between words in the code0 Now sort the code
words into An subsets according to the sequences of first letters in the
ordered pairs „ Each of these subsets can contain at most Bn members,
since this is the largest possible number of codes for the second channel
of this length o Thus, in total, there are at most A1^" words in the code,
giving the desired result e
In the case of the sum of the two channels, we first show how, from
two given codes for the two channels, to construct a code for the sum
channel with equivalent number of letters equal to A1™5 + B1"6, where
6 is arbitrarily small and a and B are the equivalent number of letters
for the two codes,, let the two' codes have lengths ^ and n2„ The new
^8 will have length n where n is the smallest integer greater than both
6~ and IT ° Wot £arm cods8 £ar the first channel and for the second channel
for all lengths k from zero to n as follows „ Let k equal ari^ b, where
a and b are integers and b < n^o We form all sequences of a words from
the given code for the first channel and fill in the remaining b letters
arbitrarily, say all with the first letter in the code alphabet. We achieve
at least A ~5n different words of length k none of which is adjacent to
Page 2h
any other e In the same way we forn codes for the second channel and
achieve Bk ~ 5n words in this code of length kc V.'e nor/ intermingle the
k code for the first channel with the n- k code for the second channel
in all (k) possible ways and do this for each value of k„ This produces
a code n letters long ?;ith at least £± (J) Ak° 06 Bn~k~nS - (AB)**n(A B)n
different words c It is readily seen that none cf these different words
are adjacent „ The rate is at least log (a + E) -6 log AB, and since 6
was arbitrarily small, we can achieve a rate arbitrarily close to log (A +B)<>
To shorr that it is not possible, when one of the graphs reduces to
non=adjacent points, to exceed the rate corresponding to the number of
letters A + B, consider any particular code of length n for the sum channel 0
The words in this consist of sequences of letters each corresponding to
one or the ether of the two channels „ Tne words may be subdivided into
classes corresponding to the pattern of the choices of letters between the
two channels o There are 2 such classes with (£) classes in which exactly
k of the letters are from the first channel and n-k from the secondo
Consider now a particular class of words of this type0 Replace the
letters from the first channel alphabet by the corresponding non-adjacent
letters o This does not harm the adjacency relations between words in the
codec Now, as in the product case, partition the code words according to
the sequence of letters involved from the first channel » This produces
at most A subsets o Each of theBe subsets contains at most Bn = k members,
since this is the greatest possible number of non-adjacent words for the
second channel of length n= k0 In total, then, summing over all values
of k and taking account of the (£) classes for each k, there are at most
^ (£) Ak Bn~k * (A ♦ B)n words in the code for the sum channel,. This
proves the desired result « We conjecture but have not been able to prove
that the equality of thi3 theorem holds in general, not merely under the
conditions given.
Theorem i In any code of length n and rate R > Co, Gc 5» 0, the probability of
error Pe will satisfy
Where pmin is the mininiulB non-vanishing pi(j)o Thus far R » C , E(R) £ - log p . „
Bage 3h
Proof: By definition of C there are not more than enC° non-adjacent
o „p
words of length n<, With R > C^, among e ' words there must, therefore, be
an adjacent pairc The adjacent pair has a common output word which either
can cause with a probability at least p"_. n , This output word cannot be
decoded into both inputs 0 At least one, therefore, must cause an error
when it leads to this output wordo This gives a contribution at least
6 pmin t0 the probability of Pe» Now omit this word from consi-
deration and apply the same argument to the remaining enR - 1 words of the
codec This will give another adjacant pair and another contribution of
error of at least e ^ p"^ „ The process may be continued until the
number of code points remaining is just enC°„ At this time, the proba-
bility of error must be at least <erJl -enC°)e",,R p^ or the expression
given in the the or em ,
"|U^ |/M).- ff^f] -Ma"^ MtM+$
; a/Cap)
» •
0 *
FIG. ii
All graphs with ij 2, 3, lij 0 nodes and the corresponding Hq for channels
with these as adjacency graphs (note CQ - log NQ)
One Node
Tvro Nodes
9 fo
« £ B- cy
Three nodes
7
lf«» • 1
Four Nodes
• e
o »
c Pr □ v\ s
2 1
• at
h h
Five Nodes
^ f O a- h V; & <fc
3 3 ' 3 3 2 3 3
3 v^^No^| 3 2 2 2 2 3
<5 ^ EN H
ft
$4.
Lower Bound for Pgf for a Completely Connected Channel with
Feedback " ~" ' ■ — -
Theorem: ?,f ^ ( 1 - ^ ) wnere M is the
of messages, the channel is assumed completely connected, anci
pmin 18 tiie minimum transition probability. Note if H£o
-mm
Proof: Choose any two messages m and m' . (if there
is only one message, the theorem is trivially true.) Let
and a^i be the first transmitted letters for m and n«.
Since the channel is completely connected, 2. and * * have
a common output letter, say y^ Determine the second trans-
mitted letters for m and m' if ^ is received and let these
be 22 and *2< . These must have a possible common received
letter yg. Find the third transmitted letters for m when
yxy2 was received and for .> when y y was received. let
these be ^ and i^. Continue this process to give a re~
ceived sequence y;, y2, . . . , ^ vvMch m±ght occur ^
m or m«. Each could cause this sequence with probability
greater than or equal to p^. At the receiver this sequence
must be decoded in an unique way, hence one, at least, of
m and m« would be decoded incorrectly if it caused this re-
ceived sequence. Say this is m - then . can cause errors
to the amount at least 1 . Now? eliminating ffi fpom
further consideration, tBke any pair of messages from the
remaining M - 1 (including «•), The same argument may be
applied to this pair to give a second source of error, die-
joint to the first, to the amount 1 p J . Continuing in
this way, we can arrive at 11 - 1 dlfcoint sources of error,
each at least 1 p^, a total of at least ( 1 . i ) p n
proving the theorem. ~ min'
A Lower .Bound for P when R > c
Theorem: For any code with rate R^c, R - c = X with
bloclt length n> 2 log 2. we have
6
Pe £ - <L
4 ( E - leg p . ;
to ' min '
3/t,
Hence for any fixed > 0, Pg is bounded away from zero.
Proof:
Pe ^ 1/2 yO (R - _1_ log 2 )
n
» 1/2 * (B - 6 )
2
= 1/2 ^ (0 + __j )
2
For any pair of code words (u,v) such that p(u,v) > 0,
the mutual information I- . satisfies
It 1 PU(V) -
~ I(ufv) = ~ l0£ FTvT™ 9 l0g ?min - -i- los
n
l0g pmin
Now whatever distribution p(u) is used, the mean of 1 I,
— (u,v)
is less than or equal to C (by the very definition of C).
we have a distribution function yo(I) which is zero for I< log p
and whose mean is less than or equal to C. This implies a lower11111
bound on p (C + & ). In fact, we must have p (G + h ) greater
than or equal to 6/2 , for if not, th^mean of the
C + S/2 - log Pm.n
distribution would be greater than p (C + <5/2)log p^
+ [C + 6/2] [l - ^(c + S/2J = log p.
C + S/2 - log Pmin
+ (C + 5/2) C - log c.
C + S/2 - log p.
nan
This is a contradiction and conseauently P >, 1/4 g
8 (TT!7?-= log p .
? 1/4 j min
R log Pmin
A Lover Bound for P
- — ■ — • e
Vie will say that the input letters in a channel are uniform if each
of these letters has the sane set of values for transition probabilities
to output letters (not necessarily to the sane output letters) „ in the
P^Cj) zjatris each row is sobs rearrangement of the lumbers in the first
row. If this is true, it is clear that the transistion probabilities
for v;crds of length n in this channel hare the same property. In
fact, the transition probabilities from a particular input v;ord -ill consist
of the r products that can be formed from the r transition probabilities
for the original channel taken n at a time with repetition allowed „
Suppose that when these r" transition probabilities are arranged, in order
of decreasing value that the total probability after element number d
is Q fc Wu(d}0
Theorem; In a channel with uniform input letters, r output letters and
the function Qn(c), any block cods of length n and rate it has a probability
of error ?e satisfying
where the brackets denote the integer oarto
Jroof i Suppose we have given a code with words. The probability of
not making an error, 1 - Fg, may be computed by taking the probability
of use for each word, e , and multiplying by the sum of the transition
probabilities from that word to all output trords which are decoded as the
given wordo When summed over all input words in the code, this gives
1 - Peo Thinking in terms of the matrix of word transition probabilities,
this means that a certain selected set of entries from each row is added
together and the final result multiplied by „ The total number of
entries added in all the different rows is exactly equal to rn since this -
is the total number of output words and each is decoded into exactly one
input word- The sum of elements in a particular row is .increased or
unchanged if we take, in place of the given elements, the same number of 1
elements chosen in order of decreasing value „ Because of the assumption
of uniform inputs, all the rows have the same sequence of values when
arranged in monotone decreasing order « Thus our first operation has
served to give us the sum of e (one for each row) beginnings of this
sequence of various lengths „
R;go 21:
If any tv:c of the rorrs have different numbers of elements added
into the am, tre can again increc.se or leave unchanged the total by
equalizing (as nearly as possible) the nusber cf tea from the too rcrr;s,
since this replaces smaller valued terns by larger ones > i-rocseding in
this manner v;e increase or leave constant the sua while holding the total
nunber of terns at exactly r % V/hen the equalization of number cf entries
frcn rows has proceeded as far as possible, the nuaber in each row will be
v:ithin one of rp/eBftt, More precisely, let vn/f* equal A^E/e3" where
A ana B are integers and B < etoo 7-«cn B of the roas trill have A + 1
terms arid the remaining = B will have A terms „ We will then have
1-Pe£ [B(l«q(A+ 1))+ (eRa - B)(1»Q0.))]
Fp 2 e^n BQ(A + 1) + U-e"81* B)Q(A)
e
>Q(A+ 1)
- «( [ea(1°er -R>+i] }
(See next page)
■
Bage 3k
Icsrcr Bound with One Type of Input and Buy Typos of Output
The inequality us have proved holds in any case «fccre the inputs age
uniform However, it cay be strengthened in certain cases c Suppone that
the input letters (cr words) are uniform in the sense previously defined
and that the output letters (or words) can be partitioned into a number
of subsets S^, Sg, ,0, with the foilstefiag property. Each input letter
(cr word) has the sane set of transition probabilities loading to words
in 5^ as any other input word, for each i0 Thus, the channel looks uniform
• for output words when only the input v ords and output words in any parti-
cular S± are considered. Let be tie number of output uords in £ e
Ict bo the probability in the tai] fee- 5 analogous to the Q(d) of
the preceding thocreja0 Thus (d) is the total probability after C els-
nents in the monotone decreasing ordered sequence of all probabilities froc
an input word to the output words in Sj B
We nay argue precisely as we did before for each particular SA end
obtain a lower bound for the probability of errors occurring with received
signals in the set S±o The total probability of error ?e is greater than
or equal to the sum on i of these individual contributions c
Pe* ? VV"**« '
A more general case may be defined as follows. Suppose the input
words can be partitioned into subsets T±, ?2, 0..,Tc and the output words
into subsets S^, Sg, ...jS^ and the channel is uniform in transitions
from input set ?± to output set S.., that is, every nusfcor of ?± has the
sane array of transition probabilities to members of SjC It is aiwavs
trivially possible to perform such a partitioning by placing all input
letters in different subsets and all output letters also in different
subsets o Uore significantly, if we consider words of length n, re may
perform this partitioning by subdividing the input words into subsets
according to their composition in terms of letters. Thus, if the letters
in the channel are a, b, g, a composition is defined by a set of
integers n&i n^ „..,n whoss sum is nc All words with exactly n a»s.
r^ t>v3, ooo, ng g»s will be placed in the corresponding input class .
This class would then have nl/n&l n^l ..<> ng* members „ In an exactly
similar way the output words of length n can be partitioned into composi-
*&g© ilk
tier?' in. terns of output letters,. It is immediately seen that eaeh word
in a partici-Oar input class has the same transition probabilities to a
certain output class as any other word in the sane input class 0 Thus this
decomposition is of the type we are considering „
We return now to the calculation of a loricr bound for P f v:ith a
nn e
given number of code words V - e • Our procedure is similar to that used
previously; we 23erfcrni operations which reduce (or leave unchanged) the
probability of error and arrive eventually at an easily computed, value,
Suppose a given code has IL members in input class T. {i - 1, 2t 0 0 0, ©)c
Let as before be the total number of words in output class S,, and let
Q^(d) be the total probability in the tail beyond entry d when the transi-
tion probabilities from a member of set T. to the words in arc arranged
in a monotone decreasing sequence 0 There will be errors in output set
S at least to the amount min Q. .(N.e^1"' + X)* This is true sir.ee we may
reduce the probability of error by equalizing the tails as before for all
words from the same input class „ Then one may again reduce or leave
unchanged the probability of error by replacing \7crds from other input
classes by that which minimizes the expression o The details are simple 0
The total Pe can be bounded from beloer by summing this over-all output
class :
Another lower bound can bo obtained by a slightly different argument 0 "
If there are e input classes and e input words, there must be a class
with at least e^/c input words , If class i contains this many words,
the probability of error will be bounded at least by ^{IMS*"*** 1)
8inse the situation is that covered by the uniform input result „ If wo
minimize this on i, then we will certainly have a lower bound for P
regardless of which class contains the e^/c or more code words „ Thus
A somewhat stronger but more complex lower bound on P can be obtained
e
by a still different variation of these arguments.
Let
" Probability in the tail of the monotone sequence of
transition probabilities from input set i to output
set j, the tail consisting of probabilities less than
P in value.
■ total number of terms in this sequence with pro-
babilities greater than or equal to P.
a. m total number of wor'-s in output set i.
Then we will show that the probability of error F satisfies
pe y, ^^ssA v (V a)
Where the P.. satisfy
»d«flfc*i \. (fy (2)
The argument here is similar to those before. Tie assume0^ M messages
coded into input set iD To obtain the maximum probability in the parts of
the tails of the distributions they should be equalized as nearly as pos-
sible to end at the same value of probability for the last term taken.
Whilfi this equalization will not, in general, come out eve", a value P .
•J
satisfying (2) will be small enough that all the tails of the different
sequences beyond this P.. will cause error *?ter the nearest possible
equalization. Thus, Pq will have a lower bound given by (1). The mini-
mizing, of course, takes account of the most favorable possible way of
dividing the M messages among the input classes.
Application of "Sphere-packing" Bounds to Feedback Case ,
In the uniform input case, the lower bounds on the probability of
error based on the sphere-packing type of argument apply also -o memory-
less discrete channels which have a feedback link giving information at
the transmitter concerning the previous received letter.
To shew this 3 suppose we have such a uniform input case where the
input letters all have the same set of transition probabilities going to
output letters. Suppose we have a block code for the feedback system of
length a. This means that at the transmitting point there; is a device with
two inputs 5 or, mathematically, a function with two arguments. « One arru-
mer.t is the message to be transn:.;w.ed, the other, the pas;;, received Istiers
(which have come in ever the feedback link). The value of the function is
the next letter to be transmitted „ Thus, the function may be thought of
M V 1 " f'k' V where x^ + 1 is the j + 1 transmitted letter in a
block, k is an index ranging from one to e , and represents the specific
message, and v is a received word of length Thus j ranges from 6 to
n - 1 and v over all received words of these lengths,.
In operation, if message is to be sent f is evaluated for f (k— )
where the means "no word" and this is sent as the first transmitted
letter. If the feedback link sends back^, say, as the received letter, the
next transmitted letter will be f(k,4). If this is received as p. the next
transmitted letter will be f (k,*,p), etc.
Remembering our asrumption about uniformity, the first transmitted letter
for any message gives rise to a set of received letters with probabilities
qi* q2' ~ " % (these beinS the transition probabilities from any letter).
In each cs.se, (that is. each (message, received letter) pair), £ second
transmitted letter is determined by the function f . Since the 3 j tiers are
uniform, each gives rise to a secori set of letters r,-ith probab: titles ,
^2' " " Ths probabilities are the sane in all cases alth:-ush the
letters t< which they apply cay differ. Thus, for each massage .hoice i^.
there exi: ts a set of possible received Ivo-letter sequences ~iJ the same
set of probabilities, namely, all pairs c. q.. Continuing in tJ Is man* r.
each message a^. when fully transmitted gives rise at the receive : to a ret
of possible received words of length n vrith the earns array of probabilities
(regerdlc, . of the partievlar message or the particular noise). Ihcse pro-
babilitie; are the set of all nth degree products of teres from \?f q - -
qt.
At the receiver, a received word must be decode?! in an jaaic n way.
^he probability of error when messege is transmitted is the : :t of the
above-mentioned transition probabilities to all words of length a which are
not decoded as ix. if a± received words are decoded ss message z:±, the:;
aA " the to* e.1 nunter of different received words of len :h n. If the
transition probabilities are arranged in monotone decreasing crtlr, the prob~
ability of errors for message is greater than or equal to t' -j sua of
terns in this decreasing sequence after term a,, since the sum c:f the first
fti terns cf a monotone decreasing sequence ovsrbounds the sua o, any ot ier
Ei terms. Thus, our estimate of P£ is decreased by taking the . Irst a, •
terms for each message m.„
Sine., the sequences for the different messages m^ are ac'.erlly ths
same, it is again decreased by equalizing, as nearlj as possible , the af-
ferent a. t This gives the simplest lover bound on r „
-1- e
The more involved and sharper result, where the different classes of
received words are considered, follows by essentia ly the same argument,
on noticing that each transmitter choic > gives ri .o to the different q.,, ,
transition probabilities and that the equalizat.cn may be carried out
•within these classes as before, altrays raducirg the estimate of Pgo
While it seems likely that, the more geraral results, where the input
letters are not uniform, (or slight modifications of these results) hold
for the feedback case, no proof has bee:, found. There is$ indeed, some
extra difficulty here because the trar: jmitter can mm take positive and
useful action depending on the result at the receiver of earlier parts
of the message • In the uniform inp it case, no very significant action is
possible, since all the letters a^e statistically alike so far as the
sphere-packing properties are concerned.
Theorem; Suppose in a channel words of length d can be partitioned into
ee3- completely connected subsets and we have given a code of length n + d
with M words and with probability of error F 0 Then we can construct a
code of length n with at least vr K e 1 words and with probability of
error Pes S 2 p^ PeL, where is the smallest (nonvenishing) p. (j)
for the channel,, If CQ * 0 fcr the channel we can construct, -ore strongly,
the code of length n with U words and probability of error p <- D~d p
"es - ^min AeL&
Corollary: Let (IL^) be any point on the reliability curve for a channel .
Construct the straight line through this point and the point (C, log p""1 ).
The reliability curve Ilea belor; or on this straight line for < E < R_
and above or on it far R '- R^0 In particular, it lies below the line
segment joining (C^, lc/ p^) and (C,0) where 0 is the capacity of the
channel „
Proof: We will refrr to the given code of length n +d as the long code
and codes of lengtb n derived from it as short cct.es » For sinolicitv
we will first cor.?ider the case where Cq - 0„ The short code is then
obtained by merely deleting the last d letters of iiach of the words in
the long codec Thus, in the long code, let us designate the words by
VD1< W W
These ?ords correspond to the M different messages and some of them
may consist of the same sequence of input letters (although in general for
a good code, this would not be the case).
The short cede consists of the words fos ,« jfc3 deccdin?
process for the shcrt code trill be rarcLaaB likselihoi. Thus, if the re-
ceived wards corresponding to the chert code ore V. , \"s t„0y and V.
is received, it is decoded as fcj&t T. *ith sr^ii^conditiorial probability
givftft V^c Since the T\ ore usee; with equal jrcfeabilit?-, thic is the T.
whose probability of causing V, is c majctea, r;6 no;, chaa that the pro-
bability of error in the short "We when & particular v> say V,, is receive!
is 2ess than or equal to multiplied by the corresponding probability
for the long code, where we oust , of course, consider oil tee possible
received signals W^. K'2> »,,4 corresponding to the U port of the long
ecde«, let be the BQxinuE likelihood detection for the short code when
?« i£ received, and let U* be the U oart of the ion?: cede for T„, r
Also let the long cede decoding system, when \\ end is received, decode
it es the message ?ffc>, that is, decide that £, ■ was transmitted.
Since Co - 0, each pair- of words of length d fca*« a wsibis cex&xi
received word. In particuLsr, for each k. and U.r have a E in coEmon.
Hence we can find a set of ;7 -s, ^ 1^, ^ such that each one is a
possible result of and one or znor'e of the U^, and every U has
sons W in the subset as a possible result,, Now the error in the 'short
code (when is received) is given by - gjl % (St), that is,
the probability of all other transmitted words except the ssa&suE likeli-
hood one (conditional on the received V ). Consider the probability of
error for the long code when V ^ is received and in particular those errors
resulting when U. or is received as Wa (say), r being the W common to
Ui and UML° *jither of Ui <* Uia c^n cause V& with probaoility greater than
or equal to p^. Whether is decoded as i or KL (or some other way),
errors will occur with probability 2 p^ ^(\) sinoe ^(U^) > Py (u,),
(since U was the mazinum likelihood U) . If there are several U ss
leading into Wq, we will again have errors caused with probability at least
pmin ^ Vj*V* SU2E9d CVBr this set of 1, since if W is decoded as one
of the Ui, the larger6 P^CU^) takes its place. In total, then, summing
over all the W 's in our selected subset, we get a total probability of
error for the long cede, when V, is received, pd fWu\) > pd P ,
«J oin i/ML vy i - ^min es*
bumming this inequality ever all V. with appropriate probabilities for the
Vy Ve chtain the desired result
P : p"? P T
es - - rain eL
3q
No? consider -one case when C. > 0, We can subdivide the set of u.
c*sd * -
into e - subsets such teat any tro tL in the sait- subset are adjacent.
This subdivision partitionE the H eode words for the long ends into
e"-LJ subsets, giving e^2- -odes for each of which the preceding argument
will apply 6 For eac-h of these, therefore, the probability of error for
the short cede is less than or equal to p*? multiplied by the probability
of error for the corresponding tart of the long cede. By the o ;-r?b±natoria2
argument used in connection with previous results, at least half the code
rords are in codes of at least half average size, and the average error
for these code words is not greater than 2p*f ? . „ Heooa, theve crists
among these a cede containing at least ~ lie-"--1- words and
with nrcbability of error F sr 2 r> ~ 5 . ,
eg ^min e.^
To prove xhe corollary, let the rats and the reliability of the
given long code be R_ and E, , s:
^-nTd10^ >
Further, let the rate and reliability of the short code constructed fron
this be R and E„
R > = log M ~ i C^d - ! log, - (1 ♦x^ ~xCx - | log 2
E * I ^ £l + I lo6 P^in + H ^ 2 * (1 *x)EI + x log p^ +| log 2
where x «•> ~ . If new v;e consider a series of codes With increasing n
approaching the and R^ of a point on the curve, then the last terms
above approach zero and the E and R of the corresponding series of short
codes have a limit suprenum on or above the straight line defined by the
equations c
R2 - (l+x)R1 ~xCx
E2 " (1 * * l02 Pmin .
This straight line passes through the point log ) and the point
(R^?E2^° The ranS® x > 0 for which our statement is true corresponds
to points to the right of (R^, Lj ) on the straight linec To the left of
(RjjSL) the reliability curve mast be on or belc.v thir. straight line, for
if it were above the line, say at (R,9E,), we could use tids point for
the (R2.s>^^) and obtain a higher valve by the construction of these short
codes at the original E- rate*
This result, it nay be noted, is very similar to Theorem = Taken
together, they alios: one to pass tso straight lines through any giver-
point on the reliability cur re, and assert that the our re lies vrithin
one acute angle to the left of the given point and within the opposite
acute angle to the right „
a consequence of this construction is that E3 regarded as a function
of R, is continuous at least for C, < R < C and also that R, regarded as
a function of E, is continuous at least for 0 < P. <z R({L). This is evident
since, for any point inside these intervals, the straight line upper and
loir-sr bounds force E(or R) to approach the given point as R (or E ) does soc.
Thooremi If wc have a code Kith M words., each of length n and with
probabllity of error P , we can construct a code of at least ~ U
wards of length n-d and with a probability of error P < 2 ? . where
e c
A is the number of distinct input letters,
p£oof : Subdivide the M given words into A subsets according to the
first d letters o The first subset consists of all the code words
containing the first input letter in all of the first d positions. The
second subset contains the first letter in the first d-1 position; and
the second letter in its d°h position and so on, lexicographically ,
At least half of the original words nust be in subsets with i i=d it
or mare members, for the total number of words in not more than A
subsets each of size not mere that ^ A** M is less than or equal to
f uf that is> less than half the total. Hence the other half is in larger
subsets, Kow consider these larger subsets. The average probability of
error in the original code for all words in these subsets is less than or
equal to 2 Pq, since, if not, the average probability of error f or all
words would be greater than Pq. The probability of error for these
larger subsets is a weighted average of the probabilities of error for
the individual larger subsets/ hence, there exists an individual subnet
with a probability of error less than or equal to 2 p . if these words
alone are used, the probability of error can only be improved, and if the
first d letters are deleted, the probability of error is unchanged.
If d - kn, E - - i log Pe and R - | log M, Then we find for toe
new code, as n-»co,
Rl~*T^k <R ~k log A)
El^T^E
This means that on the E,R plot, if a straight line be passed through
the curve at E,R and through the point E « 0, R « log A, then the E,R
curve lies below (or on) the straight line to the right of the given
point and above (or on) the straight line to the left of the given point.
A Hesult for the Memoryless Feedback Channel.
Theorem: Given a code for a memoryless feedback channel,
with block length n, probability of error P . and number of
e
messages M, we can find a code with block length n-d, probability
of error <• 2 Po and number of messages 2 m/{sibv) , where
e max
a is the number of letters in the input alphabet, b that for the
output alphabet, Pmax the largest transition probability and
d any desired integer from 0 to n.
Proof: For the given code consider the set of transmission
"starts5' of length d. Input letter z, say is -received as y, 5
next 3;, is transmitted and received as y2, etc. to as yr,^
and finally x^ as y^. There are (&%'} possible starts (z-^y^,
x2,y2, ... , x.,y.) of length d. In the given code let :v^e
occur with probabilities q-j.,^* • • 0 ' ^ (w^ers T = (ab)d ).
Let the final probability of error for each of these be P ..
(i = 1,2, T). Then = pe° Using our combinational
lemma there is at least one of these starts, with a
qw & J, = 1/2T and with a P ^ 2 P . Any message which
can cause a particular start (such as start «. ) leads to this
start with probability g = px (y1) px (y2) ... Px where
the and y^^ are those for tie start.2 The total probability
of the start is then l/M times the number of messages that can cause
the start times this product. For start oc this total probability
is ^ 1/2T, hence the number of messages must be greater than
1/2?/ l/M g £ M/2T pm^ = M/2(ab pmax)d.
The code to be used of length n-d consists of the messages
in the group ec , sending only the last n-d letters as though they
had started in the manner leading to start <* . 77e have seen that
the number available is as stated in the theorem. If the detection
system used is that for the original code and all received signals
not decoded as one of the messages in the group is counted as an
error, then the probability of error will be exactly P < 2 P .
A suitable distribution of these other received words can only
improve this value.
rage le
Continuity of P , as a function of transition probabilities a
" — - e OPt 1 — — — • — ' «-
Theorem: The probability of error for the optimal code of length n in
the channel defined by p. (j), that is, P (p. (j),n), is a continuous
function of p±0) in the region R defined by fTp^U) e l(i =1,2, a).
Proof: For a given finite, number of input v/ords and a finite number of
output v/ords there are a finite nurnber of codes containing M *.;ords0
There is also only a finite number of decoding systems for each of these
codes, hence there is a finite number of complete systems. Let these
be numbered and let the probability of error for the ith one be
Pei(i - 1,2, f). Then
Pe opt(%(^n> - ■*» pei(pi(j->n) («
i
Each Pei is a continuous function of p^j, in the region R. In fact each
Pgi is a multinomial in these probabilities' t namely, if* times the sum
of the probabilities of each code word being carried to all the received
v/ords which are not decoded as the rord in question. The minimum of a
finite number of continuous functions is a continuous function, proving
the theorem. In fact, re may say more strongly that P is made up
of a finite number of multinomials, each representing Pg ^ in a region
of the p±(j) space o
fese If
Codes of a fixed comnosition.,
Consider words of length n„ Suppose an input word has X. n occurrences
of the ith letter (i * 1,2, a) Then tre will call the vector X± the
composition of the wordo Simularly, if an output word has ^.n occurrences
of the ith letter (i ■ 1,2, . b), then p. is the composition of this
output word*. The number of different compositions of input words is
(n^a) £ n^,, The nunber of different compositions of output words is
n + Id b
similarly ( ) < n . ?»"e may consider, for a given channel, codes in
which we artificially restrict the input words to a particular
composition, say V<e can then consider problems of finding the
optimal code and its optimal probability of error and reliability c This
reliability we denote by E(R,h±,n), We will nor; show the following :
1 3 a
- log 2 + ma:: E(R-± log 2n , K ,n) > E(R.n) > wx E(K,\.,n)
The right hand relation is clear since the right hand member is the best
reliability for codes all of whose words have the same composition,
while E(R,n) is the best reliability with no such restriction and consequently
is at least as good. The left hand relation is proved as follows., In a
cods with eRn input words distributed over not more than na different
compositions, the average composition has at least e^/n* input words.
Using a combinatorial principle previously proved,
at least half of the words are in composition classes which contain at
least half this average number of words 0 V/hen a word is in such a class,
the probability of error is at least as great as if there were no other
input wards (except those in the class) and the code was the best possible
for the number in the class 0 The probability of error would again be
reduced if the composition in question were that which has the smallest
probability of error for the given number of input words 0 Translating
into reliability and rate, the reliability for the cases at hand is not
'greater than max E(R=— log 2n , \,,n)„ Since these words occur at least
half the tine, the reliability E(R,n) for the original code satisfias
the left inequality,,
-K^ge 11
Relation of R, to p
The or am; Suppose a particular code hc.s e ".rords and the distribution
function for the infer nation I is p(x) (the words being used with equal
probability) 0 Then the optimal detection system for this code gives a
probability of error ? satisfying the ineciualities
e
§ p(R ~ ~ log 2) c ?e < p(R - i log 2)
Broof ; We first prove the lower bound * By definition of the function
P, the probability = p(R - - log 2) that
cr
- log —r=-^r-j—i' < ft . - log 2
n ° p(ujp{v) n b
_p(u,v^ ,1 rZ1
iTulpTTT -2-'°
or (using the fact that P(u) - e"nE)
New fix attention on these pairs (uav) for which this inequality p (u) < 1/2
is true, and imagine the corre spending (u,v) lines to be narked in black
and all other (u,v) connecting lines marked in red. fte divide the v
points into two classes: ^ consists of those v's which are decoded into
u»9 connected by a red line (and also any v's which are decoded into u>s
not connected to the v's); a, consists of v's r/hich are decoded into u's
connected by a black linee We have established that with probability
p(R = ~ log 2) ths (u.v) pair trill be connected, by a black line* The
v's involved rill fall into the taeo clarsos C. and CU with nrcbability
1 a
P-j, say, and Pg = p(R n ~ log 2) <= p«^e Whenever tee v is in an error
is produced since the actual u x?ac one connected by a black line and the
decoding is along a red line (or tc a disconnected u) „ Thus these cases
give rise to a probability p-j^ of error. When the v in question is in
class C2,v;e have p (u) < l/20 This means that with at least an equal
probability these v's can be obtained through other u»s than the one in
question c If tee sum for these Vs the probabilities of all pairs p(u,«)
except that corresponding to the decoding system, then ue will have a
probability at least p2/2 and all of these cases correspond to incorrect
decoding. In total, then, r;e have a probability of error given by
Fe > P(R - ~ log 2 v- ,
We nor; prove the upper bound c Consider the decoding system defined
as follows o If for any received v there exists a u such that pv(u) > ^,
then the v is decoded into that u<, Obviously there cannot be more than
one such u f oz a given v since the sum .of these would imply a probability
greater than one. If there is no such u for a given v, the decoding is
irrelevant to cur argument „ We may, far example, let such u's all be decoded
into the first ward in the input code„ The probability of error, with this
decoding, is then less than or equal to the probability of all (u.v)
pairs for which py(u) £ ^ . . That is,
£ ]jjr p(u,v) (where S is the set of pairs (u,v) with py(u) £ ^)
The condition py(u) 5 i is equivalent to *^u^ < ^ , or, again, to
pfffipjv) - I P-^"1 " I ^ Ttils i£ equivalent to tha condition
n los pfu7p~(v7 - R " n log 2e rns smi Z P(u»v) "h£re this is ^ue is,
by definition, the distribution function of i log -?fy*T\ evaluated at
1 n p(u;p(v;
R - - log 2, that isfi
Pe< ]T P(u,v) - p(R- i log 2) o
Bound on Pg for Random Code by Simple Threshold Argument
Theorem: Suppose some p(u) for u words of length n gives
rise to a distribution ^(1). Then given any R and any£>0
there exists a selection of enR input werda and a decoding
system such that if these rordc are used with equal prob-
ability, the probability of error Pg is bounded by
Pe £ ,o (E * e ) + 1/2 e'n°
Proof: For a given R and consider the pairs (u,v) of
input and output words and define the set S to consist of these
pairs for which log p (u,y) > n(R + e). Thinking of the u's
and v's as two sets oP'u'p^ points with connecting lines
between , we can imagine the set of lines corresponding to the
set S to be colored red. When the u's are chosen with prob-
abilities p(u), then the probability that the (u,v) pair will
belong to the set S is, by definition of ^> , equal to 1 -p (R + q)0
Now consider the ensemble of signalling codes obtained
in the following manner. The integers 1,2,3,..., S = e1^
are associated independently with the different possible input
sequences Up u2, ufi with probabilities pC^), p(u2), ...
p(uB). This produces an ensemble of codes each using M (or less)
input words. If there are B different input words i^, there
will be exactly BM different codes in this ensemble corresponding
to the BM different ways we can associate M integers with B input
words. These codes have different probabilities. Thus the
(highly degenerate) code in which all integers are mapped into
input word ux has probability p(u1)M. A code in whjch dk of the
integers are mapped into uk has probability p(uk) k. We will
be concerned with an average probability of error for this
ensemble of codes. By this we mean the average probability
of error when these codes are weighted according to the
probabilities we have just defined. We imagine that in using
one of these cedes each integer is used with probability l/M.
Note that for some particular selections, several integers may fall
on the same input word. This input word is then used with higher
probability than the others.
s2
In any particular code of the ensemble, our decoding
procedure will be defined as follows. IT a received v seouence
has no red line coming into it (for this v. there is no (u,v )
pair in the set 3) then we decode (conventionally) as message
1- If there is exactly one integer mapped into a u connected
by a red line to this v. , we decode as the corresponding
integer. If there is more than one such integer, we decode
as the smallest such integer.
With any particular code in this ensemble the probability
of using the different tt± will not, in general, be given by
P(ui). however, if we average over the full ensemble, then each
Ui W1±i "S USed v;ith the Probability p(u,), since integers were
nappea invo u. in constructing the ensemble with just this
probability. This means that in the ensemble average, a pair
(u,v, will j£so occur with the probability p(u,v).
Now let us compute the average probability of error in
this full ensemble of codes. In the ensemble a(u,v) pair will
not belong to the set S with the probability p (R +£) We
suppose, pessimistically, that each case of this sort produce*
an error. The remaining 1 ~ ^ (R + & ) of the time, the (u,v)
pair aoes belong to the set S and consequently
10g Mil] > n(R+^
Pv (u) > p (u) en(R +&)
Fixing v at v., say, we now sum this inequality over all „«e such
that (u,v±) belongs to the set S. This subset of u's we call S
Thus we obtain i'
£ Pv.(u) > g*<H.*«) C p(u)
Now the left member is clearly less than or equal to one, it
being the conditional probability that v. was caused bv a
member of S The sum in the right membir *e v;?n denote
y vi* I Is the t,otal unconditional probability for all
members of S±1 that is, for all u-s connected to v± by red lines.
using these we obtain
Sow cons i dor the conditional probability in thti ensemble
of codes of an error in decoding when r< is received and the
correct message is oonnevted to this v_: by t red line. r1his
probability P i« siren bj
I
£ N r»- a^K""K k
She reason for this is that conditional or ire -iron inf orB£.ti.cr:
the probability t ic the ensemble ci
resv.^t be 1124"
caused by one with en&etly E integers coded into the Q, subset
is
In the case of such a code, the probability of error in decoding
is K-l . Multiplying by this and summing on K gives the
Pei exP'ression above. This may be evaluated easily by noting
that the denominator is the expectation of a binomial while the
numerator is this same expectation less £ fH\ qJ</ f % , m / . ^
Hence, we have K*i VK^t ^ v^ J 1 1 VP
provided e~n& < 1, since then MQ± < 1 and the alternating
binomial expansion is decreasing in absolute value and hence
may be overestimated by dropping the terms after l/2(Ii-l)Q^.
Now since P . ^ 1/2 e~ne for each i, the probability
of error when the (u,v) pair belongs to set S is less than
1/2 e"ne. Hence, the unconditional probability of error over
the ensemble of codes satisfies
This being the average probability cf error over the ensemble
of codes, there must be at least one particular code in the
ensemble with a probability of error this low. This proves
the theorem. More generally, one may say that at least
half the codes in the ensemble have a probability of error
less than twice this bound and at least a fraction 6 have
a probability of error less than -L- times this bound.
A bound on P for e random code.
«»- — jg
Theorem: Given a distribution p(u) for input nords of length a which
produces the- information distribution p(:c), then the • random enseals of
codes v:ith e ' words based on p(u) has an avers; 3a probability of error
satisfying
P < eRn
e
'oo
~neRn f p(x)e^ch:
J R
Hence there exist particular codes with e nsafcers and this probability
of error „
Proof ; Construct the random ensemble of cafes, each cods having e Rn
cankers and based on the given input distribution p(u). V.e wish to cal-
culate a bound for the average probability of error over this ensemble ,
In the ensemble, pairs (u,v) of transmitted and received words occur
with the same probabilities as in the original situation produced 05 giving
the input words probabilities p(u)„ V.e calculate the error probability
by an integration on the variable x occurring m the information distri=
but ion p(j:)0 The probability that a(u,v) pair are such as to give an x
lying in the interval x^xa^^s p(x± ^,)-p(x±)., For such a
(u,v) pair
i n 10g p(u)p(v) S*i + 1
or
p(u)eraci < pv(u) < p(u)eriXi + l
If we sum tho left terms of this inequality over all v:crd? u in set S? say,
with greater conditional probability, we obtain
e1*1 ^p(u) < ^p (u) s 1
5 S
Q - ^p(u) < e"^
S
since the totalprobability of set s conditional on v cannot ^ceeed 1„
Our detection system will be to choose among the possible wards in a
particular code when v is received that one for vrhicb p (u) (in the criri
probability system) «es greatest „ A (u,v) pair in tlie interval x.. ,z ^ +.
Kill be safe in a code in the enseals if the est s for that v is empty
(apart Iron: the particular « -.vhich produced, the rair}c In the ensemble,
the probability of err ear from a (u.v) pair acy be calculated as in tho
£imp3js threshold casec We obtain an upper bound £ro«a argusest
1 Rn „ < En. r 5 n(E-r-i)
»>- e « <e e " o
live nop car. overestimate the probability cf error by s issuing the
probability cf the (u,r) ):air being in the interva'i + • sauitiplied
by the probability of nerds in the set £ for such <i case , blatz also that
our' bound for the latter is one for x « E$ for SEm.ller E'a we use the bou
one rather than e v ' . &s the intervals y..f + 1 approach zero
length, cur bound approaches the integral fo.rn:
pco
P„ < P(E) * • en(R ~^ dp(x)
C J R
Integrating by parts
_ , Rn , . =xn
P S e p(>:je
ne
Fn
r co
p(x)e""rac dx
R
oo
p(x>s"nx 'ix* p(E)
Rn
p(x) 'ie
~nx
CO
Corollary; Under the Condi Dions of the theorem, suppose a maximum for
x > R of p(x)o'=3U1 occurs at x « Rffi - Then
Pe < e<R" Vn p(Rm) [log ep^f1 +n<Rm-K)]
In particular, if R^ *• R
P < P(P-) -°g epir)"1
Proof: Using the second formula in the theorem, ne have
oo
P £ ne
e
Rr,
R
e * ' p(x) dx
Page 3d
The maximum of the integrand by the conditions of the corollary is e* m' p(r:m) „
Vte also have an upper bound for the integrand e , since p(x) < 1.
These tr/o bounds cross at x - a, where a satisfies
e^n „ p(Rb)
Replacing the integrand by e"^11 P(Rm) f or x £ a and by e""31 for ;•: > a,
we obtain the upper bound for PQ
Pe < e(R p(EB) [log epCRj"1 ♦ n(Rm -R)J .
Setting R ■ B gives the second bound c
Fags 11
The Feinstein Bomd
It is interesting to compare those results with the bound on the
probability of error found by Feinstein e Using a different method of
proving the coding theoreis for a noisy channel, he found the following
upper bound for the probability of error:
V L2 * (51} J - u
in Hhish
n - block length of the code
C - channel capacity
R « - log (number of code words)
6* can be taken to be Frcb ^[H(X/Y)i log p(u/v}| > a^J
62 can be taken to be prob £ ?K(X) *i leg p (u)f > e^j
In using the values above for 6^ and 6g we are using the most favorable
values to give a low bound on P 0 The bound U above may be approximated
within a factor of 2 by a somewhat simpler expression as follows j
r^jL +6ij - u --rrq Le *6i
The left inequality is obtained by squaring the expression for U and
dropping the necessarily positive middle term0 The right inequality
2 2
f oHoets from noting that 2AB £ A + B so that U is increased by deleting
the middle term and doubling the squared terms c
The bound is somewhat simplified in the case where p(u), the probability
of input word u to achieve channel capacity is constant at 2°*1^u^0 We
then have 6g ** 0 and Sg - 0o This situation occurs, for example, in
channeXe with uniform input letters, as we have seen previously, and in
particular in the binary symmetric channel,, In these cases, the inequali-
ties simplify to
-n(A- aj
< U £ 2(2 i 4- 6*) ,
where we define A - C -R as the discrepancy between channel capacity and
*61
Page 21
rate far the code© Note also in this case that
. 6* - Jrob rjH(X/l) + | log (u/v) j >
- 2rob fSH(XA-) -H(X)-| log (p(u,v)/p<u)p<v)| > 6,j
« Ercto [~ log (p(u,v)/p(u)?(v))= C -
- p(C - 6- )
??hore p is the distribution function for inf ormation that y;c have used
previously c Making the change of variable ^ - 4-9, the inequalities
far U become
mm
This e?.v be compared with the inequality ( } found for the random code
by the simple threshold s©fct<3d. It will be seen that they are within at
TTorst a factor of 2 of each other c Sineo the bound ( ) leads in the
binary symmetric channel to a reliability bound considerably poorer than
the true reliability curve, the same cay be said of the Feinstoin bound „
fte have made no approximations in estimating the reliability bound from
the inequality obtained by Foinstein. It follows that either the type
of code (or j more precisely, the poorest code that can be constructed by
his method) is considerable poorer in reliability than the random code or
else that the bound ( ) is a relatively poor estimate of the error
probability of these codes (that is, that approximations made prior to
this formula rrere sufficiently crude as to cause this difference in the
reliability bounds) „ Which of these is actually the case sre have not
determine do
12
Relations Between Reliability and Minimum Kor-: Sfparation
In this section "re prove some results relating probability of error
with the n&B&mm separation between words in the coct- These results
show that ishen the signalling rate R is very small the reliability is
approximate].;; the minimum separation As a consequence, to obtain a good
code for R near zero, the essential feature is tc choose a set of code
words such that the ni V-ivm saparstion between any pair is as large as
possible f
Theorem; For any code with rate R and maximum likelihood detection
^min " ?<> I lo£ pe ^ 4»in * H m i loS 2
where A ^ is ths ainimun separation between Trcrds of the cede. Hence,
for any code sequence irith rate approaching zero and maximum likelihood
detection, the reliability approaches the minimum separation.
Cor; llary; K(0 ; - iim 'im max ^ein
i^ cedes of rate R
Proof { Let tiTO *or\s at minimum distance be and '.T^. The probability
of error for the code is certainly at least | times the probability of error
when Wx or 1?2 is used, sine \ of the time one or the other of these will
occur. This latter probability is certainly at least what it would be if
none of the other words (except 7^ and V^) were present, and the detection
were by maximum likelihood. Tais last is e~ n. Thus
p 2 -«min a
Taking the loga-ithm and dividing by n, *e obtain the upper bound.
he lower bound is obtained by noting that the probability of error
when a particular word is transmitted can be calculated by s -jamming the
probabilities of being interpreted as each other word. These terms are
22
o-erestimated by fcafcicr each other were fee be at separation -<d ana
adding these contributions d.isTiiicti?ely. -his amounts to adding M(U !- 0 /2
contributions (one f-.r each pair of uta^) aad gi^k^each th© value just
2 *■ r v 'T
sfotainad /LI e for the worst pair, thus
_ > 2 M(L< - 1) - n<£ Ein ^ «n -^ndn
By taking logarithm and dividing by n wo obtain the desired result.
Singe .'or F.-^O the tr.?o bounds converge to ^3 . , the second statement
of the theorem is true The corollary results on combining the theorem with
ths definition of the reliability function E.
Corollary! Let/|. . (h,n" be the EinjUauE separation between words in
the code of rate R. block length n, which maximizes this minimum distance
for a giver channel. Then the reliability characteristic E(R) for the
i
channel satisfies
EE ,2*in - R-^E(h) < HE A^n (R,n) * R
n -^»oc ' cro
Proof : For the right inequality, note that for any sequence of codes
of increasing block length n the £ (S»n)^* (R,n) (since^. is
ihc largest possible £) , for the given R and n) „ Hence for sufficiently
large n, all £ „.,r in the sequence are less than lim ifly^ + e (for any
positive c) - Nor, using the theorem ( and noting that i log 2 -?0), we
obtain E 4 Lim K * e, Tnis toing true for any positive e, it is
true for c - 0,
The left inequality alec follows easily from the theorem„ Take a sub-
■31
sequence from the sequence of codes giving ^ t which actually approaches
Lim" £f ain. Applying the lower bound of the previous theorem to this sub-
sequence of coder;, wc obtain i.he left inequality above
62
Our next r*3ult "hews that by selecting our codes the R in the upper
bounds of these results can be eliminated.
Theoremi Given a code sequence approaching rate- R and reliability E,
there exists an exp.irgated sub-sequence approaching the same rate R and
reliabilty E and with E $ Lim ^ ^hers ^min^n' " th~ B^nSmtta
n —^oz>
separation between words in the nth code in the expurgated su"-* -sequence .
Proof; For any givsn A perform the following operation. Delete, in
each code of the given sequence, ens of the points which has c nearest
neighbor (provided this separation is less than or equal to £. )« Next.
delete one of the points in the rose ting code which are ciorost together,
and so on up to the point at which no points remain with a separation le.°3
than or equal to A* This is done for all the codes in the sequence, * or
each^, either there exists an e^O for which an infinite sub-sequence of
the codes remaining have a fraction at least 6 of the original points left
or such an e does not exist. This.- dJ\i-le: values of ^into two jedekinl
classes and gives a minimum divisicn point AQ such that forZ^A the £.
exists and for A?dQ it does not*
ChoDPe u^iy small interval 6> 0 aid consider tha code sequence resulting
for-^"^ - 6. The rate for this sequence is at least R ■»■ — log e and
hence approaches R as n-^00c Furtherrcore , alaost all points in th^ nodes r«smain=
ing in the sequence have a neighbor in the interval^ - 6 to . by the
construction of . Finally, the E for these codes must be the sans as
o
the original E since errors due to points retained are only increased at
most in the ratio -i, due to increased usago of these points „ This will net
affect Eo his cede sequence is then ideal and close to uniform in nearest
neighbor separation., Almost all points have a nearest neighbor betf.reer<
^ - 6 and ^^.and 5 is arbitrarily sne.ll, -he argument about Pg given in
the preceding t$ eoreia can now be improved since almost all points have such
a near neighbor, Thus we get the inequality trithcut the R term
for any 6> 0, and h<;~r.e tt& can obtain a sub-sequence fcr which E$IdE*c (a
as stated in the theorem*
Page In
Inequalities for Bsccdable Codes
Consider codes cf the following sort. There are a basic letters
and s wards W^, Wg. cOOJ, Wg farmed "nf sequences of the letters c These
words have length^, &s (not necessarily equal) 0 The code x
is supposed to be deeodaMe^ by which we nean th?.t any finite sequence of
letters can be broken dorm into words in at most one way„
Theorem: Far such a deeodabls code we have
cr- < , ,
^ a < 1 (!)
and
Z P/j. > ~Zp± loga P, " (2)
where the pi are any set of non-negative numbers Euch that§~ pi » 1„
Proof ; The two inequalities are proved in very similar fashion. We prove
(2) first n Choose a set of rational numbers q. whose sue: is one and
which are close appr estimations to the p^, so close that
and (3)
0.A log ^T1 -2 P± log P^l < e „
This is possible far any e > 0, sirx;e both £ and £ q± log q"1 are
continuous functions of the q^ in the range of allowed values 0 Now
consider all sequences of words which contain exactly mq^ occurrences of
ward mq^ occurrences of ward 7/2j etc0 Here, m is any multiple of
the least common denominator cf the q^o All of these sequences contain
exactly m words and are of length exactly £ mq^o The number of these
seqvencos is at least
s »
m8 e ^i*
This total number of sequences must be less than or equal :.o a •
since this is the total number of possible sequences of the length in
question and each of the sequences we have constructed must to different
for unique decoding,, Thus
Fc^e 2n
Taking logarithms to the base a and dividing by r%
Using (3)
Z Pi4 > -Z Pi loSa pi " 3H " i loga V^q-2£ o
Since e is arbitrarily snail and m can bo arbitrarily large, we must have
the desired relation (2):
Z PiA 2 ~Zp± 1o% ?i •
The inequality (1) is proved as follows 0 Let pi - Aa where A
is chosen so that^ A a~^i *« !«. Choose a set of rational q^ sunning
to one and approximating to the p^, in the sense that
|ZpA-Z«AI« 6
|Z Pi losa pI1 "Z*^ loSa ^| < 6 •
Choose an integer m such that the q^m are all integers and consider
sequences containing exactly q^m occurrences of word W^o Thus there are
m words in each sequence, and their length is 5^ q^m/. „ The total number
of sequences we construct is less than or equal to the total nustoer
available, since the unique clecodability makes them all different,, Kence,
q^m^ q^ml OBO qgm2
Using the lower bound on the multinomial coefficient as before and taking
logarithms to the ba.3e a, we arrive at
Z Vi - % loga ' & • i l0ga -y2ram?ii •
Exactly the same arguEent as before leads to
Z p±4* -Zpa lo8a Pi
Pfcge 3n
ar^ replacing id. by its value As "",
-£ ■=/,
>_ Aa > -TAa 1 logn Ac 1 * Z Aa 1 ^ . - los aZ-As
X f, X 3,
0>- log,, A
-J< -1
A « (Z a x) M .
This is the desired re stilt (l)e
Page lo
Convexity of Channel Capacity as a Function of Transition Probabilities
Theorem; The channel capacity foe transition probabilities p^U) is a
convex downward function of these probabilities. That is, the capacity ~
C for the probabilities r^j) - |(p±U) satisfies the inequality. :
where is the "capacity with probabilities p^) and Cg that with pro-
babilities 9^(0)0
Proof; Let the capacity of the r±(J) channel be achieved by the input
probabilities Now consider the following channelo There are as
many inputs as in the given channels but twice as many outputs, a set j
and a sat ji Each input has transitions J P^U) £ )• Itaw»
this is the channel we T?ould obtain by halving all probabilities in the
p (j) and the. 9^.(4) channels and identifying the corresponding inputs
but leaving the outputs distinct. We note that if the corresponding
outputs are identified, the channel reduces to the r±(3) channelo We
note also that without this identification the channel looks like one
which half the time acts like the pi(j) channel and half the tiEB the
q^j) channelo An identification of certain outputs always reduces
(or leaves equal) rate of transmission. Let this channel be used with
probabilities P. for the input symbols. Then this Inequality in rates
may be written
H(x) - (§ Hy1U") * I Hy2(r)) > H(x) - Hy(x) - C
where Hy^x) is the conditional entropy of x when y is in the 3 group and
Hy2(x) that when y is in the J ' group. Splitting H(x) into two parts to
coefcine with the Hy^x) and Hy2(x), we obtain
IvIm c
where 1^ is the rate for the p^j) channel when the inputs have probabi-
lities Pi and is the similar quantity for the q^j) channel. These
rates, of course, are less, respectively, than or C2, since the
capacities are the maximum possible rates. Hence we get
A Geometric Interpretation of Channel Capacity
The calculations involved in determining the rate R and channel
capacity C for a discrete memory!?^- channel can be" given an inter3Sting
geometric formulation that leads to some insights into the properties of these
quantities .
Let a channel be defined by the matrix Up^)!) of transition
probabilities fron input letter i to output letter j (i - 1, 2, - a* 3 - lf2,
We can think of each row of this matrix as defining a vector or a point in
at - 1 dimensional simplex (the b - 1 dimensional analog of triangle,
tetrahedron, etc.). The coordinates of the point sum to one, V±(.j) " 1,
and they are known as barycentric coordinates. They correspond, for "xample,
to the coordinates a chemist uses when he describes an alloy in terms of the
fractions of various components and chemists often plot properties of alloys
in a simplex of one, two or three dimensions (lino segment, triangle, or
tetrahedron) .
"We thus associate a point or vector K with input i. Its components
arc equal to the probabilities of various output letters if only this input
were used. If all the inputs are used, with probability P^^ for input i,
the probabilities of the output letters are given by the components of the
vector sum
Q is a vector or point in the simplex corresponding to the output letter
probabilities. Its jth component is 2- P^j)*
Now, for notational convenience, we define the entropy cf 1 point or a
vector in a sinplex to be that of the barycentric coordinates of the point
interpreted as probabilities . Thus we write
H(A ) - - *5 p±(j) log PjU) i - 1, 2, - - a
H (Q) * - P± P±(j) -ogSPi P±(j)
j 1
- entropy of recoived distribution,
In this notation t the race of transmission R for a given sat of input
probabilities is given by
R * H^>±A -S'p^iUJ
- K(Q) - ^.Pj H(A.)
The fu-iction R(Q) where Q is a point in the sicplex is a con ex
upward function. Tor if the the ccnponents of Q are x^ we have
H - - *$k, log x.
|§j»- (1 -log,,)
^!L__ A I 1 *■ 3
« jyj W± 1 ■ j
Hence H. ZJx. ^dx . = -">" — • ''iO*)2 is a negative def inite ' f o: a, This
ij *. i
is true in the space of all nor.-ncgative x.^ and, !v?r«.e, certainly in the sobs
space where %,x± » 1, It follows that the rate R above is always Lon-
negatlve and, indeed, since H is strictly convex (m flat regions), that R
is positive unless IE P. " A£ whenever Pg i 0.
^he process of calculating R can be visualised readily in tlr cases
of two or three output letters, '.'ith these output letters, imagine an
equilateral triangle on the floor for the siicplex containing the prints
and Q. Above this triangle is a rounded done like the ittesge Auditorium*
The height of the done at any point A is H(A). If there were three input
letters with corresponding vectors A^, A^, A^ these correspond to three
points in the triangle and, straight up frcm these, to three points on the
dors. Any received vector Q *2 Pi i± is a point within the triable on the
flcor defined by A.^ Ay H(Q) is the height of the dome above the Q point
anc^ J>± H(A,) is the height above Q of the plane defined by the three dona
points over A^, Ag, Ay In other words, R is the vertical distance over Q
from the don. down to the pla::s defined by these three points.
The capacity C is the maximum R. Consequently in this particular
case it is the maximum vertical distance from the dome to the plar.3. This
cl* ~~ly occiars at the point of tangency of a plane tangent to the dome and
parallel to the plane defined by the input letters.
If there were four input letters , they would define a triangle or
a quadrilateral on the floe depending on their positions, and their vertical
points in the done would in general define a tetrahedron0 Using them with
different probabilities would give any point in the tetrahedron as the sub-
tracted value )jg> H^). Clearly, the maximum R would occur by choosing
prtbabilitijs which place this subtracted part on the lower surface of the
tetrahedron.
These remarks also apply if there are still more input letters. If
there are a input letters they define an a-gon or less in the flocr and the
vertically overhead points in the dome produce a polyhedron. Any point in
the convex hull of the points obtained in the dome can be reached Tith
suitable choice of the P. and corresponds to some subtracted term in R.
h X
3t is clear that to maxaiz-ze R and thus cb'cain C cm ased only consider
the lower surface of this convex hull.
It is else clear geometrically ^ from the fact that the lower surface
of the polyhedron is convex downward and the dome is strictly convex upward*,
that there is a unique point at which the maximum Rr that is C. occurs-. For
if there were two such points, the point halfway between would be even
better since the dome would go up above the line connecting the points at
the top and would be at least as low at the bottom surface. The rate R is
thus a strictly convex function of the received vector Qfi
It is also true that the rate R is a convex upward function of the
input probability vector (with a barycentric coordinates Pp ~~ ?a
rather then the b coordinates of our other vectors) „ This is true since
the Q vectors Q and Q« "orresponding to the input probabilities ?± and
are given by
The Q corresponding to o< + <XF^ (whereof +3-1 and both are positive) is
+ ?Q' and consequently the corresponding R£ R + ;3R', the desired resultc
The equality can occur when Q - 0 ' > so we cannot say in this case a strictly
convex function.
These, last remarks also imply that the set S of P^ vectors which maximize
a
the rat? at the capacity C form a convex set in its /dimensional simple:-. If
the maximum is obtained at twe different points it is also attained at all
points on the line sag. rent joining these points „ Furthermore, any local
maximum of F. is the absolute maximum C, for if not, .'win the points corres-
ponding to the local maximum and the absolute maximum,, The value of R must lie
on or above this liza by the convexity property, but must lie below it when
sufficiently close to the local maximum to make it a local maximum,, This
contradiction proves our statement.
Another property we say deduce is that the capacity C can alvays be
attained using not more than b of the input letters. "..Is is becfc^se any
point on the surface of a b-dimensional polyhedron is interior to some face.
This face may be subdivided into b - 1 dimensional simplcxes (if ' i. is not
already a sinplex). "he point is then interior to cne of these „ The ver-
tices of the simplex are b input letters, and the desired point can be
expressed in terms of these .
This picture gives considerable information concerning which input
letters s hould bo used to achieve channel capacity. If the vector A^, say,
corresponding to input letter t, is interior to the convex hull of the
remaining letters, it need not be used. Thus, suppose A_t "^r^C^SU^)
where^.'*'- 1,°< ^ 0. Then by the convexity properties H(A_t) H(A.,).
If by using the A^ with probabilities P^^ we obtain a rate R - h(£p^ ki)
-^Pi HfA^), then a rate greater than or equal to R can be obtainsd by exprea,
ing A^ in terms of the other A^, for this leaves unaltered the first term
of R and decreases or leaves constant the sum.
In the case of only two output letters the situation is extremely sim-
ple. Whatever the number of input letters, only two of them need '«*3 used to
achieve channel capacity. These two will be those with the maximum and mini-
mum transition probabilities to one of the output letters. These values, P^^
and Pp.s^v, are then located in the cne-dimensional simplex, a lins segment
of unit length, and projected upward to the H-curvc as shown in Pij. £• The
6 x
secant line is drawn and the capacity is the largest vertical distance .•from
the secant to the curve. The probabilities to achievr this capac.ty «re in
proporti - io the distances from this point to tha two ends of -Jr.- se onto
In the case of three output letters, the posit Jons of all vectors
corresponding to input letters may be plotted in an equilr^ral triangle c
The circumscribing polygon (convex hull) of these points any ctv 1 z t'ker
and any poii ts interior to this polygon (including those on edges may be
deleted. W. it is desired is the lower surface of the polyhedron "etermired
by the poix.'s in the K-surface above thesa points. This "ower su'faee, in
general, will consist of triangles and the problem is to deter mir. which
vertices arc connected by edg23„ A method of doing this is to consider a
line joining a pair of vertices and then to calculate £ov other i .nes whose
projections on the floor crc '3 this lino, whether they are above it or below
it in space c. If there is no lino below the first line, this line is an
edge on t«e lower surface of the polyhedron. If a second line is found
below the first line this one may be tested in a simjlar fa~hion, and even-
tually an edge is isolated. This edge divides the projection int > two scalier
polygons and these may now be studied individually by the same n&ans. Even-
tually, the original polygon will be divided by edges into a sat of poly-
gons corresponding to faces of the polyhedron, kaeh of these polygons may
then be examined to determine whether or not the point of tangrnc;' of the.
parallel p±Lm which is tangent to the H-surface lies over the polyhedron.
This will happen in exactly one of the polygons and corresponds t.; the Q for
maxiraum R,
\
Log Moment. Generating Function for the Square of a Gaussian Varlate
Supnose x is a gaussian random variable with varianceY"2. Its
density function is 2
1 _x
p(x)dx e -,-2 dx.
VST""
2
The random variable u » x will have a density distribution q(u) obtained
by substituting x «Vu , dx • du/ and then multiplying the result by
2„ This last operation takes account of the two halves of the original
distribution which both go into the positive u range . The result of these
substitutions is
\ 1 - tt
q(u) da "f- e lr? u>0
The moment generating function $(s) is calculated as fcllorrsr
*(e) e-* q(u) d-u
•(a- 57)u ± -
) VTjfw"
w
7
dw
Vl ^sT?
In the third expression we make the substitution | - - s}^ 1 he integral
in the third line is recognized as integrating to 1, being, in fact, a
special case of the density function q(u) above. Notice that the integral
and hence -v(s) exist only when s4»-^rt
The log of the moment generating function and other useful functions
can now be calculated. We have
u-(s) - log *(s) - - \ log (1 - 2a*2)
u-(s) - su'(s) - - i log (1 - 2sT*) - £^f
2 1 -
- 2S-Y2
n(s) - (s + l)n'(s) - - i log (1 - 2s^2) - (S - ^
1 - 2s"V
u"(s) - ^ * g - 2(n')2
(1 - 2sf)Z
I
1, v^-« WV
c
4
So ==■ ^ ^ vf/
tl
Upper Bound on P for Gaussian Channel by Expurgated Random Code
In the gaussian channel with average power limitation we assume code
words chosen at random in a sphere of radius /P. If the number of dimen-
sions n is large enough, the fraction of points at a radius between
(1 - 6) Jo and /? will be greater than 1 - £ for any positive c and 5»
We wish to calculate the rate R - - leg M f or a random code such that the
n
expected number of code points within E of a given code point is less than
0- 9qual to one-half. In the figure
0 is the crigin, X is a code word at radius -/P. The sphere of radius D
centered «>n X intersects the original sphere of radius /F in an (n - 1)
sphere whose intersection with the plane of our drawing consists of the points Y and
Z„ All points interior to both spheres are included in the ehpere of
length OX and radius ■/! (in n dimensions). Hence, the volume common to
the two spheres is less than or equal to the volume of this sphere , which
is K /A where K is the coefficient of r" in the formula
n n
for the volume of an n-sphere. The total volume of the /P sphere is Kn (J?)11'
If there are e points chosen at random in the /P sphere, the expected
number within distance D of one of the points, such as X, will be less than
n
ens h </*>
e
Kn (/P)n
Now if R - logjj - ^, this expected number approaches zero as n-?ao
for any 6^0. If the point X is not on the surface of
the yff sphere but at a slightly smaller radius, VP - e, the radius of the
sphere > VA> is slightly larger, VS + b^. However, by caking c approach
zero, 62 approaches zero and its effect may be absorbed in Thus, if
in our original sphere with points distributed at random we first eliminate
all points except those within c of the sr/v-fice, the expected number within
D of one of the points will approach zero as n—^ao provided the rate R » lo^j
and e is sufficiently small. By eliminating those points which have neigh-
bcrs within D we can still obtain a rate R as close as we wish 'to^|'. Now,,
since in the remaining expurgated codn no point has a neighbor closer than
D, the probability of error may be calculated by our theorem on minimum
separations. It wii: be less than enR tirr^s the probability of noise carry-
ing a point a distance ?*d/2 or more. The distance d/2 can be related to
Ya by the cbvious trigonometric eauation «w « / £>5
- sin ^ sin "
eauation , / \>s\
leaking use of the theorem on reliability for a given minimum separation,
and the asymptotic formula for large n for erf x, we obtain
Eliminating A by its relation to R, we get the final bound on reliablility E
Ej^ysin 2 \ sin ~Y e"R -R.
Note that as R-*0 this lower hound approaches the same value as the upper
bound on E previously derived. Thus vre collude that E(0) - jL„
Lower Bound on P in Gaussian Channel by Minimum distance Argument
In a code of length n with Id code words, let m. (i - 1, 2, „ . m,».
s • 1, 2, o . be the s coordinate of code word i„ We are assuming
an average power limitation P, so
V/e also assume an independent Gaussian noise of power N added to each
coordinate.
We now calculate the average squared distance between all the H(I4 - l)/2
pairs of points in n~space corresponding to the M code words. The squared
distance from word i to word : is 21 (m - ■ «L ) . ^he average D~ between
s is
all pairs will then be
7? 1 , ,2
D (m - m ) .
14(M - 1) 8,1,;] 1 28
Note that each distance is counted twice in the sum and also that the
extrs-r; ous terms included in the sum, where i - j, contribute zero to the
sum. Squaring the terms in the sum,
72
:-D L ^ s i is J
^ — - 2i!Fn M
M(M - 1)
7? . 2nMP
r < ,
iff - i
where we obtain the third line by using the inequality on the average power
(1) and by noting that the second term is necessarily non-pcsitive.
°[ 2
If the average squared distance between pairs of ocints < 2nM?A» -1
there must exist a pair of points for whose distancs this inequality holds.
Each point in this pair is used I of the tint* Tr.e best detection for
separating this pair 'if no other points irers irpOd b„ by c, plane
normal to and bisecting the joining line segneir^ and either point would
then give rise to a probability of error equal to tbst of the noise, carry-
ing a point half this distance or acre in l c« .cilice direction* Y7£ arrive
therefore, at a probability of error
-V?! rr [noise in a certain direction £t^ I
'(li - 1)2N
As n-roo and assuming il-^oo also in such a cay as to approach a definite
rate - log K^R^C we may translate this into a bound on the asymptotic
reliability,, This is done by ^ing the asyEptotic Iormula erf x ^1 1 ^x
x -x * "
Using tnis, taking the logarithm and dividing by n gives the simple upper
bound on reliability
2
The Sphere Packing Bound for the Gaussian Power Limited Channel
The analog of the sphere packing argument can be carried out in an
interesting geometrical fashion for the gaussian channel. V.'e assume an
average power limitation F and an independent gaussian noise of n coordinates
frith variance N in each coordinate. Consider the n~sphere 5 whose squared
radius is P + 6(„ Since the average squared radius to the signal words is
P or lesss a fraction at least ■- of these words are within the P * 6(
sphere fors if not, the fastioii greater than (1 - p J ») at distance at
least P * 6 would give more than P for the contribution to the average
power by themselves, V/e will estimate the errors due to only the signal
words inside the P * 6, sphere. Even if all code words outside this sphere
never caused errors and this minimum possible fraction p ^ were inside
the sphere, the probability of error for the entire code would be that of the
code consisting of these interior points multiplied by p- ~ ^, and in general
the probability of error will be greater than this. Thus the reliability of
the original code can be estimated from that of the interior points with
an error not exceeding i log p ^' g.
n ,
The argument we will use is similar to that in the discrete channel but
with certain complexities and refinements added. We consider >*. sphere of
suitably chosen radius Jk« The volume VR of this sphere will be divided by
the decoding process for the code into a number of regions, regions which are
decoded as the various particular signal points. To each signal point we
will assign a certain volume V-j^ of "high" probability density and a second
volume V2 of "'low" probability density. These regions and V2 are congiuent
for the different signal points. The probability density of a point being
carried by noise into any part of its ?- region will be greater- than the
density for any part of its Vg region,, Both of these regions will, for any-
signal point, lie entirely within the sphere of radius y'K. The conclusion
will be that for any placing of Vv/7, points the probability of error -"rill
be at least equal to the probability of a point being carried irv.o v region.
This is because, in a way similar to the discrete process, starting with the;
original nartitioning of VvJ we can reallocate voluae assigned to a given
point in order cf decreasing probability density and equalize allocation betwee
points until each point has assigned to it. These operations preserve
total voluse and decrease (calculated) probability of error. When the equal-
ization is couplets, each signal point has its vg region assigned entirely to
other points, and consequently tba probability of error is at least that of
a point being tal:cn to its Vg region„
In the figure
0 is the origin, X is a signal point at naxiaal radius']? + 6^ and the large
circle is the intersection of the K sphere with the plane of the drawing . At
X we construct the hyocrplarie perpendicular to OX, and let the distance from
X to the intersection of this plane with the K shpere bv/\XK * 6?0 Here,
N is the average noise power, X is an arbitrary multiplier , and 6_ is a
small quantity which will eventually approach zero, Itafr construct the two
hemispheres of radii \\H and \XK + 6 centered on X, pointed toward C and
bounded by the hyperplane . It is clear that the entire vc lints s cf both of
these hemispheres are within the large K sphere . A'he smaller hemisphere is
the V., region for signal point X and the shell between the hemispherical
surfaces is the region. For any other signal pointy a similar pair of
hemispheres is constructed by drawing the line from the origin to the -ig^'l
point, constructing the perpendicular hyperplane and constructing hemispheres
of radifHXN 'and'(~XN + 5„j facing toward the origin0 If the origin itself
vrere a signal point, any hyperplane through the origin may be used. It is
obvious in the drawing that anv point of these hemispheres actually in the
plane of the drawing is within the K sphere., (being nearer to the origin than
^ K " ) o But the plane of the drawing may be made to pass through any desired
point in the hemisphere by suitable rotation, hence the property is true in
general.
Since probability density for a given displacement from a signal point
is a iconotone decreasing function of the actual distance of displacement , the
probability deisity for any point in the shell is less than that for any
point in the inner hemisphere. Let IS be the nunber of signal points such
that the conbined volume of their small hemispheres is just equal to that cf
the K s;.here. Thus
Now j whatever the decoding system or the placement of Ji points interior to
the'{p + 6 sphere, the probability of error P (due only to errors inside
the K sphere) will exceed the probability of a -oint being carried into its
Vg shell, *his follows from our- general argument concerning reallocation
of voluas in accordance with higher probability,. Thus if the msssage place-
ment and decoding system allocate any volume in shells or other low probability
density regions to ode points, a lower calculated Pg would occur if this were
calculated as though at the higher probability density of the in'ier hemi-
sphere. rthen this reallocation is finished, we have a probability of error
satisfying
where Z is the squared radial displacement of a point due to noise (divided
by n) . Since Z is the sum of n independent £ aaussian variater* e«.eh with
variance U, Z is distributed (apart from scale) according to the "7^, dis™
tribution with n degrees of freedom. Thus
.,r# &ri 4
- - / — ■ — =r— e T
> 2T #
For any given b 0, the logarithm of the^L distribution from X to X + *p
_ X-l
is asjnnptotic to log ~ — . ^h±s Can be easily shown by use of the moment
c, K
generating function and the results on the tails of distributions obtained
previously. Consequently our reliability E, as n-500, is asymptotically less
than or equal to ^ log i-^— . Also the rate R - \ log log L-^-^/~?± log
P + XE J- 6, + &2
. Since this is true for any 5n , 6^0, we may omit them
entirely and obtain asymptotic bounds for «■ and F. as follows.
log - ^ J — y
'^h©Se formulas give an upper bound on the reliability carve in a para-
metric form using the parameter X which ranges from 1 to »o« With X .lust (
greater than 1, we have a rate just below channel capacity and a reliability
bound which is justt slightly positive. As the value of X increases, the
rate R decreases and the bound on E increases, ^mw.mm infinite when \ is
infinite and the bound on rate is zero. Of rourse the bound based on minimum
p
distance shows that the actual E curve does not exceed ^ as R-».
The '^terminal Channel
Almost all previous wcrk on coding theory has dealt with a one-directional
channel having an input or transmitting point and an output or receiving
po:'nt, or, at most, with this arrangement plus a feedback charms! from the
receiving point to the transmitting point whose function was thoug-vt of as a
possible aid in forward communication. l!aay cases arise , however, in which
a number of inf ormacion terminals are involved and both backward and forward
communication is cf interest perhaps between all pairs of terminals,. As
examples we may cite telephony (or even ordinary direct conversation) where
communication in both directions is important, or a network of radio or
television stations in whti.^-h there are a number of communication links
using a common medium^
A further complication is introduced by the possibility of competition
or conflicting interest among the individuals controlling the operation of
the various terminals. As an example we have the case of a secrecy system
which is best thought of as a three-terminal channel with the transmitter
as one input, a receiver as one output and the enemy cryptanalyst as a
3econd output. The object is to transmit information from the transmitter
to the receiver withCut knowledge by the enemy. A second example is the
problem of "jamming", again a three- terminal channel, but new the enemy
has an input rather than an output and his object is to reduce or eliroiaate
the direct transmission of information.,
These possibilities suggest that we should frame general definitions of
T-termin&l channels and study their characteristics from the information
theoretic point of view, V.'e shall here, for simplicity, limit ourselves to
tho discrete case quantized in timi.
Definition; A ^-terminal finite state channel consists of T inputs x.
(i - 1, 2, T) each of irhich may assume values from a finite alphabet
(not necessarily the saiss for the different inputs ) , T outputs y , y2,,.eS y
each of which can assure values from an associated finite alphabet, and 8.
state variable S which can assume any of a finite set of values 1, ".. Do
Finally, there are conditional probabilities for the next outputs and the
next stats conditional on the current inputs an-; current state;-
ir (y.'/Sj.x, j So, . . x_)o and Pr(S«/S, , Sr,? • -'»«*
Definition; A msmorylcss ?~terminal fir:<te state channel is one in which tt.
stats S can assume only a single va'Tue.
Definition; A noiseless T^ter-minal discrete channel is one in which all
probabilities are either 0 or 1. Thus, the next state and the next outputs
are strictly determined by the current state and current Irpat:? , In the
noiseless memoryless case, this stats can have only one value so the next
outputs are functions of the current inputs 0
In operation of a T-terminal channel we imagine operators or equipment
at each of the terminals. Also at each terminal, in general, will be an
information source . The operators are attempting to transmit information
produced by the sources between the terminals according to some general plan
and system of codes "hich has been agreed upon„ In general, the operator
at terminal i can control the input i but only as a function of the data
available to him at the time. This includes the -past and present of output
i and the output of message source i up to the present time but not the
future of these random f unction?, nor any of the other inputs, outputs" or
message sources (past or future) „
We will first consider the completely cooperative situation in which
the operation of all terminals is directed toward a common end. The pro-
blem is very similar to a one-person game in the game theoretic sense with
"split personality" for the player. We can think of the cpsrati.>: l; the
various terminals conferring at the beginning on a general strategy., selection
of codes and decoding operations, and then going to their respective terminals
and operating the system according to the agreed-upon plan. Together they
act like a single player whose knowledge i& making different moves is not
coextensive o
In the more general case, one may consider a p-person game in which the
T-terminals are partitioned into p subsets, the operators in each subset
having a common purpose which may conflict with those of other subsets „ Tne
operators in a given subset agree on a strategy to promote their goals and
act as one person in a kind of p-person game.
In the fully cooperative case there are many utilities one might wish
to maximize in a given channel. In line with basic coding theory, however,
our attention is directed to the question of generalizing the coding theorem
for a noisy channel to this kind of a situation,, In other words, we would
like to find ths capabilities and limitations of a T-terminal channel with
regard to essentially errorless transmission of information between the
different terminals. At a given terminal, say terminal 1, we may imagine that
the information source 1 produces information which is destined for various
other terminals 2, 3, T. It might also produce some information which
was intended for both terminals 2 and 3, and some for both 2 and h» etc., and
indeed it might have a component intended for any subset of the other ter-
minals. 3?he same may c? course be said of any other terminalo In general,
T~l
we think of each 5iessage source as producing not or* but 2 -/streams of
independent information intended for the 2 "-/subsets (omitting the null
subset) pf the other T-l terminals .
A simple two-terminal one-way channel is characterised at the simplest
c oding level by it* capacity C. In the '.'-terminal case, ws must consider
the capacities of all the different types just described, that is, C^s the
capacity from terminal i to subset T of the remaining terminals, a total of
T— 1
T(2 =1) different capacities. Furthermore, these are not fixed quan-
tities but, in general, capable of some variability. Thus, one may increase
one of these capacities at the expense of reducing another. Our fundamental
problem is not to evaluate a single C as before but to find which sets of
values of are possible.
In the case of only two terminals but with an input and output at each
terminal, there are only two different capacities C^, since there is orT-y
one non-null subset of the remaining terminals. These capacities we may
write C12 and C21- Our problem is to find the possible values of the pair
(C12' C21^ °r' better» the boundar-' of this domain 111 the C12» C21 SpI
This boundary may be called the capacity surface.
The channel in Fig. 1 is a simple example where the two boxes represent
an ordinary one-ray msmoryless channel with capacities Cj and C2<, The graph
at the right of Fig. 1 shows the region of attainable rates in the two direc-
tions and the heavy line boundary of this is the capacity surface. In this
case transmission in either direction neither aids nor hinders transmission
in the reverse direction (feedback cannot increase forward transmission in a
memory less channel) .
The channel in Fig. 2 is rare interesting from this point of view „ The
two binary inputs from the two terminals are added mod 2 and the output is
a common output going to both terminals., Here again it is possible to
achieve points in a rectangle. Note that at each transmitter the transmitter
symbol should be added mod 2 to the next received symbol to compensate :'cr
its effect. It is curious that, in a sense, two bits per time interval are
going through the vertical line of the drawing, one destined in each direc-
tion o
Another channel is indicated in Fig. 3. ^'here are VhW- input letters
a, b, c at the left terminal and three input letters A, B, C at the right
terminal. If a is used at the left, the channel from the right is as shown
in the figure, a channel with capacity 1. B or C come through to correspond-
ing received letters B1 and C while A divides with probability ^ between
these. If b or c is used, the channel from right to left has zero capacity,
all letters A, B, C dividing equally between B' and C«. In the reverse
direction, the situation is similar with capital letters exchanged for small
letters. Thus there f>s a direct conflict between sending information to
the right or the left. Any point in the triangiilar region can be attained
but, we suspect, nothing outside. To obtain a point on the diagonal boundary,
say ci2 = x *od c2l ' 1 " x» the channel may be used x of the time to the right
(that is, the right hand operator uses A) and 1 - x of the time to the left
(the left hand operator uses a). In each case, the other operator sends at
full capacity.
In the general T- terminal memory less channel, essentially this apportion-
ment of time nay be carried out to prove the f ollovring theorem.
66
Theorem; The capacity surface is convex outward. That is,, if the sets
Ci(y and C^a can be attained (where i ranges over the terminals and a over
subsets of terminals excluding i), then the set of capacities
cm -xc.ct* (i~x)qg QiXil
can also be attained.
This is proved readily by subdividing the time between the coding sys-
tems -which give Cia and C£a in the ratios X and I - V- If these are irrational,
they may of course be approximated by a sequence of rationals.
Conditions for Constant Mutual Information
Theorem: In a ctannel with p±(j) matrix and ?± input probabilities
necessary and sufficient conditions that the mutual information be constant
are tfcst
(1) P.-(j) " a function of j only
(2) _
Pi - h, '.ndependent of j, when S.. is the set of input letters
that can csiuse output Letter j.
We also have Zf . - h*"1 - e1, where I is the constant information value o
3 Fi PiC3) I
Proof ; Suppose log j~— * I. Then pi(j) - e a function of ;
only. Also if qi(i) is the conditional probability of i given j, then
s . 0 s .
To prove the sufficiency, assume (1) and (2). From (1)
Now summing Pi \^ ■ <lj(i) over ici0 and ising (2).
h\ « 1
so \ . is h-1 independent of j„ Hence I - . og h" o
J
Simple Sroof that H (x)^H(x)
We wish to prove that
21P(i, j) log p.(j)^ - 2. P<j) 1°E P(vi)
i, j 1 0
We will prove this 'or each particular summing on j will then give the
desired result. Thus we will shew
- 2 p(i, o) log p. (j)^ - p(j) log p(j)
i 1
or
- 2. P(--) Pi(j) log Pi(j)^.- P(i) P-tt) log S-P(i) P^)
i i i
Consider c?(x) - x log x. %is function is convex down ward for x J 0 since
<p' ' (x) = i^O,, Therefore it satisfies the inequality (see Hardy, Littiewood
x
and Polga "Inequalities" p. 7k)
(pC^L^ xi)^"^L ^ where<>31 - 1
Take x^^ * pi(j) and q^ =■ p(i)
^ p(i) P±(j) log "2L p(i) P±(j)^£p(i) p±(j) log Pi(j)
This is, after multiplication by (-1) and summation on j, the desired inequality.
Equality occurs only if all p1(j) for a given j are equal. Then p^(j) » q(j)
and P(i» j) = p(i) That, is, the two events are Andependento
The Central Limit Theorem with Large Deviations
The central limit theorem states that under certain general conditions
the sum of n independent random variables is approximately gauss ian in the
neighborhood of its mean value when n is large. The most common theorems
of this class give good estimates of the probability at. deviations cf the
order of K"\n from the mean, while mere advanced results with added terms
(for example, the results on p. lit 7 of Feller, Probability Theory and Its
Applications) alio'17 somewhat larger deviations but still require that the
————————
deviation from the mean divided by n approach zero for the estimate to be
asymptotic to the correct value v~ith large n.
We Tri.ll develop asymptotic formulas under certain conditions for the
probability density, the probabilities of the tails of the distributions ,
etc., for arbitrary deviations. In the usual central limit theorem, the
behavior near the mean is related to the characteristic functions or, as
we prefer here, the moment-generating functions near the value zero. It i£
interesting that the results here show that the distribution remote from the
mean is in a very similar fashion related to the moment-generating functions
at arguments avray from zero. Thus we are able to attach a fairly direct
significance to the value and derivatives of the moment-generating functions
at non-zero arguments. Indeed, the method of derivation of our asymptotic
estimates is a kind of manipulation trick whereby points array from zero are
-
translated into zero. This device is due to Escher and has been used by
Cramer in a manner similar to our analysis,, However our results go further
than those of Cramer, most of whose work applied only near the mean of the
distribution.
Let F(x) - Pr |u^x*^be the distribution function for the random vari-
able Uc The moment-generating function is then
oo
<p(a) - \ eSX m(x)
i
Let this converge in the range i. < s < B (sither or both A and E nay be infinite).
We are interested only in cases where E7 D/iU This includes distribution
functions which are bounded in range or which approach aero and one expo-
nentially cr faster, as -with the gauss ian distribution or the distribution
-4 I ri
whose density is e 2 f M „
The moment-generating function is an analytic function of s (thought cf
as a complex variable) in the strip where a< Re < B, If n variables,
all independent and distributed according to the same F(x), are added, the
sum X is distributed according to the n-fol:! convolution F (x) e The momsnt-
generating function of F (x) is
^(s) - j^<? (s) Jn.
We wish to estimate FR(\n) when n is large.
Consider a new random variable u whose distribution function q(b) is
defined by ■ .
\ e ° dF(»)
„ f s — oo
0(») -
-6o
s ■
e 0 dF(»)
Here .. xs a* arbitrary real constant lying bst^scn * anc
~ o
The nosvnt-generat-iijg function fcfr G flfe. • is
oc :
-I e
.V dr(s)
The asstn and variance of the 0 uistributio:-: tay oe found iron the f.
second derivatives o! C^{s) evaluated at s • C Thus
f 0
£_i£ f
L* (tVj
Kou suppose n variables, all independent and distributed according to
are added. The sufli z will be distributed according to Q (§} ritn the
mcnent-genorating funct ion
n
'his ir^-Iies that
sos
cir;ce v.i'i-ioxj of v in the argument oi the ^rating function c.rra-".;
tc z r.x! tij. ligation by e'° in the distribution function.
Thus the distribution G(i) after n-fold convolution is still closely-
related to the n-fold convolution of F(x).
dFn(x) - v(rQf e 0 dGn(x)
The basic mathod of using this relation to study the bshavior of
the distribution F(r.) is ss follows » A value of s is chosen in such a
u o
way as to make the moan of the Q distribution occur' at the value xof 7
n
in which we are int3restedo When this is done, Gn(x) can be estimated
well from the ordinary central limit theorems, since these are particularly
good at and near the msan0 The relation between Fn and G is then used
to translate estimates of G„ behavior into estimates of F behavior.
/ « n
It is convenient to use in place of the moment-generating function
0(a) its logarithm, which we will denote by u-(s ). This function is some-
times called the semi-invariant generating function , In terms of u,(s) we
have
dFn(x) -e^ e-SXdGn(x),
The successive derivatives of p.(s) evaluated at zero are called the
semi-invariants of the F distribution, in particular,
/ u(0) - 1
M.'(0) - Jx dF(x) * mean of F distribution
u-ri(0)« )x2 dF(x) - a2 of F distribution
For the G(x) distribution, the log moment generating function u (S) is
given by (taking the logarithm of (1) ^
u-G(s) - n(Ss +s0) - n(ao).
Consequently, for all derivatives (using a superscript to denote differentiation)
In words, the semi-invariance of the G distribution are the derivatives
of the F distribution evaluated at &0. In particular, the mean and
variance of the G distribution are t*'(so) and n"(so). The mean and
variance of the Gn distribution ara, similarly. njj.'(s0) and nu.,,(so),»
Note that the operation of forming the new distribution function
G(x) (or the corresponding new randcn variable) from a given distribu-
tion function F(x) (or its random variable) is a group operation*. Thus,
if we let T denote the operation whi.;h applied to F(x) gives G(x),
3
/ .00
TsF(: e*Sx dF(x) / ( <TSX dF(;
then the T form an additive Abelian group isomorphic to the additive group
s
for real numbers,
TSl ' TS2 * TS1 * S2
T - I.
o
The operation T is distributive over the binary operation of convolution
s
(which itself is commutative and associative). Thus, if we denote convolution
of two distribution functions by an asterisk and repeated convolution of
the same distribution by an asterisk preceding the exponent, we have
T (F * G) ■ (T?) * (TO)
5 DO
T (F*n) « (TF)*D.
B 8
This last equation, when we operate on both sides by T „ gives the basic
"•3
result we have used in estimating tails of distributions,
- L(TF)#D.
~*s s
If we think of the operation T F = G as producing a new probability
s
measure for the random variable x, then there is a one to one correspondence
between points in the two probability spaces involved, the F space and the:
G space, and also between points in the product spaces of F v,rith itself
n times and G with itself n times , The probability measures in the two
spaces are very closely related. If a point in the * space has value x and
probability P, the corresponding point in the G space has value x and
probability Q ■ eSX P /jeSX dF(x). If we select a subset 8^ of points
whose x values all lie between A and E, then we will have
where k"1 = JeSX dF(x).
The Chernoff Inequality
To illustrate the use of the G distribution in estimating the tail of
the Fn distribution, we will first give a crude but simple and useful bound
on the tail due to Chernoff, who proved it by a different method. TCe have
Fn(x) = e^So) ^ e"^ dQa<y).
=00
If s^O, the maximum of e ; 0 occurs at y ■ x. Thus
=00 s < C
v e ^ o' e o
This is true for any x and any gQ, but to obtain the most favorable hound
we should choose Sq so as to minimize nu-fs^) - x5q (for the x in question) „
Remembering that p,(a) is analytic and that (i"(8) > 0 (since it is a variance)
the necessary and sufficient condition for a minimum jfy that nu-'(3Q) - x.
This will have a unique solution in s . However, it is more convenient to
o
express our result parametrically in terms of s ^, or, dropping the subscript,
in terms of s . Thus
Fn (ntWs) )6e'\^°' " ^™ / s< 0
(nu<(s) )sA«a) -^,<80
in a similar fashion, by integrating from x to op, we obtain a bound on
the tail in the positive direction of exactly the same type. Combining
these res-alts we have the following. If Fn(x) is the distribution function
of the sum of n identically distributed random variables, each with log
moment generating function n;(s) which exists for A<£<B» then
Fn(nuKs))^ e4(S> -S^(S)) A<s<0
v.
Thsse bounds are very .ioefc!' in that they are retreat ly $i;nple to
icmt ute « l-'urthericore. tiisy ere iKv i «vvy.t:;ii~ tc B or I - F'n a<-
li-^jc, tba lor&rithrs :>f the L^r..; '.: tc trc icgST-it-hcs c r
P - li 1 - P (in the rsssiecoirt s ran£e?t., r.e v.. • seti, Xtter. ?:er .. 2
if s>3 &-.'e interns-. . or»b* iz tra l-.-raritiar. c>: ? .0 l&r» e -■ the Ghs-rr.ii'
brv": is rivs the oc rect a.~-"-ctc.i:.r ■ssh."?'. ii
In tbs "J? ' i,zg ct-ti'n* :•• a : '-11 - .7- ». r* ins . e-. ti.:;:.too of
F. (>•;• 1 - Ih„ ' hf ^..::-r aera sari J& -sti-.u-.:.' ; "As -Ut.'.ris'i arrs? .
7' h Wi*l V?'.: 1;. fcfU',-y8r ; is, tc •-• ?:k' tc ; n-s c£-y • -
tft* : --- ;-:e- ).r ? 3'rtrP .'.isi . ra -IV. -••-3J.1-.U ih'-i'is rti.tC*«
r
i." - . riouc oi ^actions ■
T>1
Upper and Lower Bounds on the Tails of Distributions
Theorem; The distribution of the sum of n 1 :entically distributed independent
random variables satisfies
Fn (b))~) 1 _ en(M:<3) - L +
1 -Fn (au' (.)) f * H12^"^) V s ' V ^
where u.(s) is the log moment generating function of F(x), Hrj ^d ' 1
are derivatives and c is an absolute constant, tie constant in the Berry
theorem relating to the approximation in the central limit theorem -,:ith
error less than or equal to Also c may le replaced in the inequality
o- ntL
by 3 In riffn.
Proof: We have oq.
3 .> G
nil' (s)
Proof : We have oo
1-Fn(n^.(s)) -en^(s) \ e~sx dGn(x)
On making the substitution
1nn"(s)
and writing Hn(y) for Gn^ np ■ (s) y - n^'(s) we obtain an Hn distri-
bution, with- mean at zero and variance one, suitable .'or application of
ordinary central limit results. The equality above becomes
oo
Fn(nn.(s) ) - " ) jj J dHn(y)o
0
H (y) can be estimated from the Cramer-Berry-Essee n theorem. Thus
n
Hn(y) -6(y) + B(y)
B(y)<fT
where - 3^ / ^' '"f and [J^ is the third absolute moment of F„
The integral then breaks into two parts. First vre have:
injL« • (s ;
; y
dj(y)
1
-y /2 -> s"f nu.' ' (s) y
dy
qo
-f*T J
s^'n^'Cs)) ' s n^"(s)
e- —
dy
s nu"(s)
e 5 $ (s^nu.'f(s))
2 2
s ny."(s) -s nix"(s)
"flw s/Vihi"(s)
1 +
"T
s nu.
u."(s) /
s \2Tmu.' 1 (s) s nu.1 • (s)
The second integral involving dB(y) may be bounded by integrating by
parts,
op
r°? , . , -f
oo
B(y) + sfnu«'(s) \ B(y)e
0
^ — £. + slnp.' ' (s) — ±
v -fn~ -1/n* sVnn"(s)
2cp0
» 2c sj2nn"(s) £3
s^nnp.' «(s) u." |
Collecting these terms, we obtain a bound for the tail of the distribution:
n sl2Trnu"
n(u - su^)
1 +
s nu-"
+ 2c
By a well-known inequality p,1^ 3i ^ and ?, - u.1V m 3vV' ')2<
This results :ln the
Consequently -= o/o
3 (a")3'2
final inequality involving only n, s ana u. and its derivatives (together
with the unknown absolute constant c). Since the original Lyapui:ivr£
theorem (with constant estimated by Cramer) gives an inequality for B(y)
as follows
we Eay, in our inequalities, replace c by 3 log n. This makes them corn-
■
pletely definite, although at a certain loss in order of magnitude as a
function of n when n is large*
To estimate a lower bound on the tail, the method is identical up tc
the point where we must estimate the following integral,
1 1
cp.
Again using the theorem involving — it is evident that the monotone
n
increasing function Hn(y) which would minimize this integral, subject to
being within from <£(y), would be that shown in the figure.
This function starts at zero as high above jfc (y) as possible and is con-
stant at this value as lcP-g as possible. It then ircreases as slowly as
possible. It is easily shown that any other permissible Hn(y) gives a
■0 h
larger integral than this f unction $ when changed into this function in the
obvious way the integral is decreased. In the figure the corner in the curve
2cp.
occurs at A, which is such thatX(A}- J[(0) * -. — -. To obtain a single
Tn
estimate of A which is on the safe side (that is, larger than the actual
A) we may approximate J[(y) by a straight lins passing through i at y = 0
and of slope i. This will lie below Jj> (y) out to y = 1.86. Hence if the
U 8cp
A computed from this straight line, namely A is less than or equal
H/n
to 1.86, the estimate is safe. If not, the more elaborate formula involving
j£(A) may be used. In any case, our lower bound integral becomes
9P , ■
e
A
On completing the square, as before, this becomes
s2nu-' ' — s2n^ » (s7np," - A)2
e~~2 4.(8^ np." - A)£ e 2 e 2
(e^nuTT - A)*V'2n
(s^r^i" - A) ^np."
2
exp (-A sfl3T« +
Collecting these results we have the following r
Sep
Theorem* If A = ■ s is less than 1.86
Fn (n^(s))~) 1 • en(^(a) - s ^(s)) e" A ^ * Tj
, tf\Jfi£px ' (f^-A/sfn)
1-Fn (n»*'(s);j
Asymptotic Behavior of the Distribution Function
Theorem: Let n random variables have the sane distribution function
F(x), the logarithm of the moment generating function u.(s) existing for
A < s B where A < 0 < B. Let F^x) be the distribution function for
the sum of these random variables.
(1) If F(x) is not a lattice distribution, we have asymptotically
as n—^co
\ svtennu.' 1 (s)
(2) If F(x) is a Lattice distribution with maximum span h and A
is the distance from nu.'(s) to the next lattice point in the
direction away from the mean, then asymptotically as zi-?oq
F.6p.'(s)We°|Sl ._h , 1 en(n(s) -an«(B)) A<5<0
nV ' l-e""18' h^2n nn"(s)
' 1 - e Is lh /^2toiam(b)
iYoofi Consider first the non-lattice case. The two results s> 0 and
s<0 are substantially the same. We prove the s> 0 case. As in the theorems
giving upper and lower bounds, a change of variable, y - * ~ (3) reduces
aji.» •($/
the problem to that of estimating the following integral
We new use the Cramer-Esseen theorem (Gnedenko and Kolmogoroff p. 210)
which states, in effect, that for any 0 there exists nQ such that
when n 7 n we have g
with ~B{j)c^L. Thus the integral say be written as a sum of three integrals:
-co
e"s^" y U 2-(y),+ du(y) + dB(y)~]
e-ys/a^a-y2) J
TT fir\ <m Two fH'ncT ini
where U(y) * c ' "-^3 — — . The first integral may be evaluated exactly,
■XT* 6crTn"
on completing the square in the exponent. Its value is
2
Using the well-known asymptotic formula for 1 -^>(x), this expression is
asymptotic to
2 2
s nu.r j -s no.' '
~~ 2 1 m 2
stTunp.' •
■1?
nnp.'
The second integral is In fact, let the integral be divided into two
ranges ^
/-1/6
The first integral is because the total change in U(y) in the inter-
val is o(— ) while the integrand is bounded. Note that U(y), in addition
to^n in the denomination, is flat at 7 - 0. K-snce, as the interval of
integration approaches zero, ^U(y) is
The integral \ is clearly bounded bye 1 ^ K where
-1/6
K is the total variation of U(y). Since this latter is finite, and in
fact even approaches zero as n increases- the term in question is cer-
tainly o(-h.
Finally, the last of the three integrals is clearly bounded by —
vn
and consequently is o (~) • Thus we conclude that
oo Mn
0
and, hence, that as n-^co the tail of the original distribution with
s ^0 has the following asymptotic formula
i « /-..,.. nW 1 nfi(s) - sn'»(s))
The analysis for the case of a lattice distribution is quite similar
but involves another term. We use the theorem of Esseen (Gnedenko and
Kolmogoroff , p. 213) which may be phrased for our purposes as follows.
For any £ ? 0 there exists nQ such that when n > nQ we have
■£(y) -Y^T 6a2T5" -V^" " ^
with B(y)<— . In this formula o is the second moment and *L the third
moment of the H distribution , Also, h is the maximum span, that is, the
largest distance such that all jumps of the H distribution occur at mul-
tiples of this from each other. A is the position of the first jump in the
H distribution in the positive direction. Finally, S(Z)«JV) - Z + that
is a saw-tooth function which jumps vertically from - ^ to ♦ j at the
integer values of Z and decreases linearly with slope - 1 between the
integers .
r a) — Ti
To estimate » .^Yny,"
|" e-s^njj.« « y ^(y^ we 0Dse?.ve first that three of
the terms are identical with those involved in the non-lattice case and
consequently the integral with respect to these functions is xsjniptotic
"to -■' ° The only term to be evaluated is that involving the S
s Y2TTnu.5 ' (s)
function. This can be written as the sum of two integrals on talcing the
'differential of the product
f-s€^7' y _h_ s de~y2/2 + C^l^77" y__h_ e-y2/2 ^
Vmi1 « J /V2nnu"
The first integral is «f~) . This can be seen by dividing the range of
-1/6^° -=1/6
integration, 0 to n ' and n to oo, as before. The argument is essen-
tially the same. In the first interval, the integral is small because of
the flatness of e'^2^2 and because of the-fn" in the denominator. In the
second interval, the term e~s/W" 7 forces the integral to be small. ?he
second integral above, integrating on dS, can be divided into an infinite
sum for the jump points of S and an integral d((y oYn)/h)for the
sloping parts of So The infinite sum is
e-y-i2/2
^ 2nnp.' •
where the summation is over the y^^ which make the argument of the S function
an integer:
(y± )^n n't
K
h
where K is an integer, or
hK + A
y. - —
Thus the sum becomes
To estimate this sum, we use again the device of dividing the range of
summation into a part from 0 tb n"^ and a part from n to oo. In
the first of these sums, the exponential with the squared exponent approaches
the constant 1 for all K in the range, and the sum reduces essentially to
a geometric series . Asymptotically, then, the sum becomes
h -s £ 1
6
^nTT^ 1 - e~hs
We have still one further term to estimate, namely
oo
-J
~s W' ' y -y2/2 hlT^T" V~n" dy
~h
e ' ^ " e
0
oo
1 1
-f
e-siRTT 7 di(y).
o
This term is exactly equal to the original :d j£(y) integral and opposite
in sign (since the saw-tooth slopes are in the negative direction). These
two terms therefore cancel each other, and we are left with only one term
of order — . -he final answer, then, is that asymptotically, with large
n,
00 -As
[
1 — e
This completes the proof of the theorem.
h " ^S
It may be noted that if the coefficient - — ^ be expanded in a
1/ h 1 ~ e~ \
power series, the first terms are -£L ♦ s(^ -A) - •••}• Hence, as h—?0
(and also, therefore, A ~2>0) the lattice result approaches the non-lfittice
result, as is to be expected. It may also be seen that with A- the
lattice coefficient is a particularly close approximation to the non- lattice
coefficient since the quantity ^ then vanishes. Indeed, in this case,
the coefficient becomes i (1 - ^ (h s) +...).
S cu
Generalized Chebycheff and Chernoff Inequalities
Suppose we have a random vector (x-^, Xg, ...3 xk). Let 0 (u^ vl,, u.^,)
be everywhere non-negative and monotone increasing in all the u^, and
assume F j^C^, Xg, x^^J exists.
Prob fx^ t± (i-1, 2, k)J^- "1* 2'
E^fx^, x2, xk)3
0 (t^ tg,..., tk)
(1)
If we choose for 0 the function
s u, + s„u- + ... + 3.XL
[f ug, ..,.uk) - e X 1 Z^ (alls^O)
and let u^, s2, «..*, sk) - log E(0) ■ log (moment generating function of
the distribution), then (1) becomes
- \i(s , sp, s.) -2Z s.t.
Prob |x. ;> t.(i=l, 2, k)J ^ e 1 ^ 11 (2)
This bound is minimized by choosing the s.^ to satisfy
|f- - t± (i-1, 2, k) (3)
If the random vectors are the sum of n independent random vectors, each
with the same distribution, then the \t* (s^, say for the sum vector, is
nu.^) where n^) is the log (moment generating function) for the individual
random vectors. The above result may then be translated
Prob [x*^ nt± (i-1, 2, k)J^e L * * 1 tl
with the best choice of s . , those which satisfy ■ t . „
1 $s. i
Channels with Side Information at the Transmitter
Claude E . Shannon
(1)
Channels with feedback from the receiving to the transmitting
point are a special case of a situation in which there is additional informa-
tion available at the transmitter which may be used as an aid in the forward
transmission system. In Fig. 1 the channel has an input x and an output y.
i
encoder —
H>
L & .
channel
U->—
Fig. 1
There is a second output from the channel, u, available at the transmitting
point, which may be used in the coding process. Thus the encoder has as
inputs the message to be transmitted, m, and the side information u. The
sequence of input letters x to the channel will be a function of the
available part (that is, the past up to the current time) of these signals.
The signal u might be the received signal y, it might be a noisy
version of this signal, or it might not relate to y but be statistically
correlated with the general state of the channel. As a practical example,
a transmitting station might have available a receiver for testing the current
noise conditions at different frequencies. These results would be used to
choose the frequency for transmission.
A simple discrete channel with side information is shown in Fig. 2
mod 2
j random
(0, 1 device
Fig. 2
In this channel, x y and u are all binary variables; they can be either
zero or one. The channel can be used once each second. Immediately after
it is used the random device chooses a zero or one independently of
previous choices and with probabilities 1/2, 1/2. This value of u then
appears at the transmitting point. The next x that is sent is added
in the channel modulo 2 to this value of u to give the received y. If
the u side information were not available at the transmitter, the channel
would be that of Fig. 3,
1/2 0
Fig. 3
a channel with capacity zero. However, with the side information available,
it is possible to send one bit per second through the channel. The u
information is used to compensate for the noise inside by a preliminary
reversal of zero and one, as in Fig. 4.
u
Fig. 4
Without studying the problem of side information in its fullest
generality, which would involve possible historical effects in the channel,
possibly infinite input and output alphabets, etc., we shall consider a
moderately general case for which a simple solution has been found. See
(2)
also in this connection Silverman
The memoryless discrete channel with side state information.
We consider a channel which has a finite number of possible states,
s,, s», ... , s„. At each use of the channel a new state is chosen,
i' L' g
probability |t for state 8t. This choice is statistically independent
of previous states and previous input or output letters in the channel.
The state is available as side information u at the transmitting point.
When in state s£ the channel acts like a particular discrete channel Kt.
Thus, its operation is defined by a set of transition probabilities
Pti(j), t - 1, 2, ... , g, i - 1, 2, ... , a, j - 1, 2, ... , b, where
a is the number of input letters and b the number of output letters. Thus,
abstractly, the channel is described by the set of state probabilities <gt
and transition probabilities pti(j), with qt the probability of state t
and Pti(j) the probability if in state t and i is transmitted, that j will b
received.
4
A block code with M messages (the integers 1, 2, ... , M) nay be defined
as follows for such a channel with side information. This definition, incident-
CD
ally, is analogous to that for a channel with feedback given previously
If n is the block length of the code, there are n functions
f1(m;u1), f2(m;u1, u2), f3(m;u1, u2, u3), ... , fn(«;ulf u2, ... , un) .
In these functions m ranges over the set of possible messages. Thus
m ■ 1, 2, ... , M. The u± all range over the possible side information
alphabet. In the particular case here each u£ » 1, 2, ... , g. Each
function takes values in the alphabet of input letters x of the channel.
The value ft (m;u^, u2, ... , u^ is the input xt to be used in the code if
the message is m and the side information up to the time corresponding to i
consisted of u( , u^, ... , u^.. This is the mathematical equivalent of saying
that a code, consists of a way of determining, for each message m and each
history of side information up to the present, the next transmitted letter.
The important feature here is that only the data available at the time i,
namely nj uj, u2, ... , u^, may be used in deciding the next transmitted
letter x^, not the side information uj^, ... , yet to appear.
A decoding system for such a code consists of a mapping or function
h(yi> yz> --^yn) of received blocks of length n into messages m; thus h
takes values from 1 to M. It is a way of deciding on a transmitted message
given the received block y1? y2, ... , yn. For a given set of probabilities
of the messages, there will exist, for a given channel and coding and
decoding system, a calculable probability of error Pe; the probability of
5
a message being encoded and received in such a way that the function h
leads to deciding on a different message. We shall be concerned parti-
cularly with cases where the messages are equiprobable, each having proba-
bility I. The rate for such a code is - log M. We are interested in
M
the channel capacity C, that is the largest rate R such that it is
possible to construct codes arbitrarily close to rate R and with pro-
bability of error Pe arbitrarily small.
It may be noted that if the state information were not available at
the transmitting point, the channel would act like a memoryless channel
with transition probabilities given by
P;u) - i qtptiu> fa „ ; ; ■"-;.)
t <> ■
Thus, the capacity C, under this condition could be calculated by the
ordinary means for memoryless channels. On the other hand, if the state
information were available both at transmitting and receiving points, it
is easily shown that the capacity is then given by C2 - L qtCt where Ct
t
is the capacity of the memoryless channel with transmission probabilities
p (j). The situation we are interested in here is intermediate -- the
state information is available at the transmitting point but not at the
receiving point.
Theorem. The capacity of a memoryless discrete channel K with side
state information, defined by qt and Pti(j), is equal to the capacity of
the memoryless channel K* (without side information) with the same output
alphabet and an input alphabet with a& input letters X - (Xj, x^, . •• ,xg)
where each x^-» 1, 2, a. The transition probabilities *^(y) for the
channel k' are given by
r (y) - r (y) - 2 qtPtxt(y>-
X Xj_, x2, ... , xg t
Any code and decoding system for K can be translated into an equivalent
code and decoding system for K with the same probability of error. Any
code for K has an equivocation of message (conditional entropy per letter
of the message given the received sequence) at least R - C, where C is the
capacity of k' . Thus any code with rate R > C has a probability of error
bounded away from zero (independent of the block length n)
P. >
R - C
6(R + I In 5:)
It may be noted that this theorem reduces the analysis of the given
channel K with side information to a memoryless channel K* with more input
letters but without side information. One uses known methods to determine
the capacity of this derived channel and this gives the capacity of the
original channel. Furthermore, codes for the derived channel may be
translated into codes for the original channel with identical probability
of error. (Indeed, all statistical properties of the codes are identical.)
We first show how codes for k' may be translated into codes for K. A code
word for the derived channel K* consists of a sequence of n letters X from
the X input alphabet of K1 . A particular input letter X of this channel
may be recognized as a particular function from the state alphabet to the
input alphabet x of channel K. The full possible alphabet of X consists
of the full set of a& different possible functions from the state alphabet
with g values to the input vallue with a values. Thus, each letter
X ■ (xp X£, , Xg) of a code word for K* may be interpreted as a
function from state u to input alphabet x. The translation of codes
consists merely of using the input x given by this function of the state
variable. Thus if the state variable u has the value 1, then x^ is used
in channel K; if it were state k, then x^. In other words, the translation
is a simple letter by letter translation without memory effects depeading
on previous states.
i
The codes for K are really just another way of describing certain of
the codes for K -- namely those where the next input letter x is a function
only of the message m and the current state u, and does not depend on the
previous states.
It might be pointed out also that a symple physical device could be
constructed which, placed ahead of the channel K, makes it look like k' .
This device would have the X alphabet for one input and the state alphabet
for another (this input connected to the u line of Fig. 1). Its output
would range over the x alphabet and be connected to the x line of Fig. 1.
Its operation would be to give an x output corresponding to the X function
of the state u. It is clear that the statitistical situations for K and k'
with the translated code are identical. The probability of an input word
for k' being received as a particular output word is the same as that for
the corresponding operation with K. This gives the first part of the theorem.
8
To prove the second part of the theorem, we will show that in the
channel K, the change in conditional entropy (equivocation) of the message
m at the receiving point when a letter is received cannot exceed C (the
be the next input letter, output letter and state letter. Let U be the past
sequence of u states from the beginning of the block code to the present
current y. We are assuming here a given block code for encoding messages.
The messages are chosen from a set with certain probabilities (not necessarily
equal). Given the statistics of the message source, the coding system, and
the statistics of the channel, these various entities m, x, y, U, Y all
belong to a probability space and the various probabilities involved in
the following calculation are meaningful. Thus the equivocation of message
when Y has been received, H(m|Y), is given by
capacity of the channel k'). In Fig. 1, we let m be the message; x, y, u
H(m | Y) = - 2 P(m,Y) log P(m|Y)
m,Y
(The symbol <^Q^> here and later means the average of G over the
probability space.) The change in equivocation when the next letter
y is received by
H(m | Y) - H(mjY,y) = - <log P(m|Y)> + <log P(m|Y,y)>
9
m A P(mtTty)P(T) )
- mm - 4- m^r>
^ / P(yimY)\
H(m|Y) - H(m|Y,y) - <log p(y) / (1)
The
<p(Y v) \
log p( y)p(y ) / an
average mutual information and therefore non-negative. Now note
that by the independency requirements of our original system
P(y|xj = P(y|X/m ,u ^U) = PCyjx^^UjU ,Y) '
Now since x is a strict function of m, u, and 0 (by the coding
system function) we may omit this in the conditioning variables
P(y|m u U) = P (y|m u U Y)
P(y,m,u,U) P(y,m.u,U,Y)
P(m,u,U) " P(m,u,U,Y)
Since the new state u is independent of the past P(m,u,U) = P(u)P(m,y)
and P(m,u,U,Y) = P(u) P(m,U,Y). Substituting and simplifying
P(y,u|m,U) = P(y,u|m(U ,Y)
Summing on u gives
P(y|m;D) = P(y|m(U;Y)
Hence:
10
H(y|m,U) = H(y|m,U,Y) < H(y|m,Y)
- (log P(y|m,U)>< - <log P(y\m,Y)}
Using this in (1)
H(m | Y) - H(m|Y,y) < (log P(^yjU) > (2)
We now wish to show that P(y|m,U) = P(y|X). Here X is a random
variable specifying the function from u to x imposed by the encoding
operation for the next input x to the channel. Equivalently , X corres-
ponds to an input letter in the derived channel K'. We have
P(y|x,u) = P(y |x,u,m,U) . Furthermore, the coding system used implies
a functional relation for determining the next input letter x, given
m, U and u. Thus x = f(m,D,u). If f (m,U,u) = f(m',U', u) for two
particular pairs (m,U) and (m1, u') but for all u, then it follows
that P(y|m,U,u) = P(y|m', u', u) for all u and y ; since m, U and u
lead to the same x as m', u', and u. From this we obtain
P(y|n,U) = 2 P(u}?(yj0,U,u1 = 2 P(u)P(y |m' ,U ' ,u) = P(y|m', U*).
u u
In other words, (m,U) pairs which give the same function f(m,U,u)
bive the same value of p(yjm,U) cr, said another way, P(y|m,U) = P(y(X),
Returning now to our inequality (2), we have
H(m|Y) - H(m|Y,y) < ^ogSU^l^
PCX)
H(;a|Y) - H(m|Y,y) < C.
< M <loE >
11
equivocation
This is the desired inequality on the equivocation. The/cannot be
reduced by more than C, the capacity of the derived channel K' for
each received letter. In particular in a block code with M equiprobable
messages, R = — log M, If R > C, then at the end of the block the
n
equivocation must still be at least nR - nC, since it starts at
nR and can only reduce at most C for each of the n letters.
It is known that if the equivocation per letter is at least R - C
then the probability of error in decoding is at least
P > SjUS ' 'I /
■ 6 6(R + ±iog|) *y
Thus the probability of error isjfounded away from zero regardless of
the block length yi , if the code attempts to send at a rate R > C.
This concludes the proof of the theorem.
■
As an example of this theorem, consider a channel with two output
letters, any number a of input letters and any number g of states. Then
the derived channel K* has two output letters and a6 input letters.
HoweVer, in a channel with just two output letters, only two of the
input letters need be used to achieve channel capacity, as shown in (3).
Namely, we should use in k' oaly the two letters with maximum and minimum
transition probabilities to one of the output letters. These two may be
found as follows. The transition probabilities for a particular letter of
i
are averages of the corresponding transitions for a set of letters for
K, one for each state. To maximize the transition probability to one of
the output letters, it is clear that we should choose in each state the
letter with the maximum transition to that output letter. Similarly, to
12
minimize, one chooses In each state the letter with the minimum transi-
tion probability to that letter. These two resulting letters in k' are
the only ones used, and the corresponding channel gives the desired
channel capacity. Formally, then, if the given channel has probabilities
pti(l) in state t for input letter i to output letter 1, and
Pti(2) - 1 - Pti(l) to the other output letter 2, we calculate;
.._
Pi - £ qtBax Pti*1)
t i
p. - L qtnin Pti(l)
L t i
The channel k' with two input letters having transition probabilities
Pi and 1 - p^ and 1 " P2 to t*ie two output letters respectively, has
the channel capacity of the original channel K.
Another example, with three output letters, two input letters and
three states, is the following. The probability matrices for the three
states are: (the states assumed to each have probability 1/3)
State 1 State 2 State 3
100 010 001 :V
0 1/2 1/2 1/2 0 1/2 1/2 1/2 0
In this case there are 23 - 8 input letters in the derived channel K* .
The matrix of these is as follows:
13
1/2
1/2
0
0
1/2
1/2
1/2
0
1/2
2/3
1/6
1/6
1/6
2/3
1/6
1/6
1/6
2/3
1/3
1/3
1/3
1/3
1/3
1/3
If there are only three output letters one need use only three input
letters to achieve channel capacity, and in this case it is readily shown
that the first three can (and in fact must) be used. Due to the symmetry,
these three letters must be used with equal probability and the resulting
channel capacity is log 3/2.
In the original channel, it is easily seen that, if the state
information were not available, the channel would act like one with the
transition matrix
1/3 1/3 1/3
1/3 1/3 1/3
This channel clearly has zero capacity. On the other hand, if the state
information were available at the receiving point or at both the
receiving point and the transmitting point, the two input letters can
be perfectly distinguished and the channel capacity is log 2.
Some Miscellaneous Results in Coding Theory
Claude E. Shannon
This paper contains a atsmbsr of sorr.ewliat miacsllaneotts results? centered
chiefly on the problem of coding sources Siii© noiseless channels, lnclnc hr? •
cassa '.-here the. channel s/aihals have different durations or costs.
gjre mnaher °f sequences of a given length
Svyrcs? a na~aer cf letters, are available: whose lengths (or durations)
are a.., a.,, • • •» a ai*c* ,'73 a bound on the number of sequence;.', of total
Here it is assuraed that any sequence of letters is allowed. 17e
achlne ?•!(£} to be the number of different sequences whose total length is
greater than t - cmin but not greater then £. Here is the smallest a. .
Ta.as ?!{;!} might be thoujJ.it of as the number of sequences of length £ where
we allow filling cut with a blank to an e;;tent up to the shortest letter. This
.■r" ;i:;;.o-i .niches I;(f } better behaved (e.g., it is now monotone increasing/
than if we court only sequences of eroaeily length £.
rUJ satisfies the difference equation
Nf£) = I-I(f - a,) r ITU - a2) * . . . v N(£ * l.J £ > o
as we see by acting that each sequence cf length £ must end in one or another
of the available letters. Furthermore, .the boundary conditions may be taken
to he Ml) « 0 for P. < 0 and N(£) « 1 for 0«£<a
* ' * * *&in
Associated with the difference equation is the folIoT/iag characteristic
equation:
-a, -a
1«S + X "i ... i S
Sine 2 all the a, are positive and real* the right-hand mernber is a strictly
monotone decreasing function of K and varies from co to 0 when X goes
from 0 to co. Consequently, the char act eristic equation has a unique posi-
tive real root W.
£-a
Theorem: For all 4. 1T{£ ) « For all £ » 0. 3> T/" raa3 ,
This will be proved by a kind of Induction on increasing intervals of i,
each interval of length a . . Consider first the upper bound w*. This
•s certainly true for 0 3. a «n# since in this range If(jt) - 1 and W > I .
.how assume the upper bound true out to some iy Then for t in the range
fl , s? fi < 5 2 t a v/e nave
* i-T{/Z - a.) * IT'il - a?) * . . . + K(j> - a J
<S W + *v*v 6 + . , . + V7 *
T.'hus t':j iheoreea is tfcsn true for the increased interval up to £ . f .
It folio rs that the bound is true for all £.
The lower bound is very surlier. It is certainly true for 0 « £ « a
j2 -r r'lsu
since if(.C) ^ 1 in this range and w m?-:c .< il The inductive step gees
through as before. Assuring that for 0 < & (with £, 5* a ) we have
~— a
N(£) ^ VI* then in the c::iendo& range from £ , to £, * a ^ we have
H ^ zasu
Thus by extending the range v/ith steps cf a . v/e obtain the result for
all positive £ .
This result, of course, relates to hoy/ rapidly it is possible to approach
the capacity of a noiseless channel v/ith unequal symbol lengths. Thus
for ft ^ 0. from this theorem
(log w) < J. log N(£;j =s log W
The approach of possible signalling rate to. the capacity log W is rapid,
a
the discrepancy at most — ~~ .
An interesting alternative proof that K(£) as can be given as
fellows. Assume, in contradiction, that for some £, N(J2) > W£. Then,
since IT(0) « Y.'°, there is a greatest lower bound of £«s, say for
which the theorem fails. In the interval £* ^ JZ * ;a there must
z nun
be an 2, say iy for which the theorem fails. Subdivide the sequences
ci length & , into subsets according to the first letter. Let ths fractional
number in the subset beginning v/ith the letter i be f (i •» 1, Z g).
Cheese the subset for which aT1 log f'71 is a minimum. In a sense, this
menus the subset which conveys the least information, log f?1, per unit
time in its first letter. The minimum value of aT1 log fT1 among the
different subsets is less than or equal to log W. To see this, suppose,
in contradiction, that for all i, aT1 log fT1 > log W. Then i.< W and,
summing on i, i = £, f. < I, W * 1, a contradiction. Hence the subset
chosen will have a!"1 log f~A «s log W, or f. ^ W *. If we delete the
first letter frosn all sequences in this subset, we are left with a set of
more than W sequences of length £, - a,. Thus ri{£, - a.) > W .
Since £ - - a. this contradicts the assumption that £* was the
greatest lower bound of £>'s for which the theorem fails. Eence the
theorem is true for all j? .
The cese with unequal letters and a finite est of constraints
, „
A more general problem of the same sort relates to sequences which
are subject to a finite state set of constraints. Thus, suppose there are
d states and that in state i, letters of lengths .1^.. are permitted leading
to state j. The lades a ranges o"er the uiuei-etii letters &oiag iraa
state i ',o stite j and j ranges over the different states which can follow
siule :.!o 7 Iet-N..(j2) be the number of sequences which are possible
and which start in state i, end in state j and are of length These
quantities are readily seen to satisfy the difference equations •
(i)
Ths corrospoatliag characteristic equations are
J a, 1
Let V7 be the largest real root (there is a positive real root as shown in
the. append!::} of the determinant equation:
Z «•
a J
= 0
and let A. be a corresponding (positive) solution of (S)- : ; - • ~
i
the graph of the constraints has complete accessibility so it is £o*s\'S.z
to go frosi-i any state to any other. 'Than ail the A. are p03it.lv {:iO:3
vanish) .
We will show thai the nuraber of ssqttoaesa of length £. starting in
state i and ending in j, N. ,.(£), is bounded by
A. -
k..(j») w~
This is certainly true for £ < 0 and also ibr £ » 0 since thou both shies
are one if I « jt and otherwise the loft side is verc with the vr/d pes5-
tive. We now proceed by the inductive type pros 333 as be 'ore, assr-v.hr:
the in equality oat to scree £, and then show it follows for £ out to £ , plus
the minimum £ •••
Thus the inductive step carries the inequality up to I ~ £ , + rain £ ... and
hence it is true for all £ .
5
An explicit cede for a variable length alphabet
It is possible to generalise a coding process v/e have described
elser/here for a binary alphabet to the case where there are a number
of symbols of different "durations" or, more generally, with certain
associated "costs .« It is desired to encode a finite set of possible
messages with associated probabilities p., p.„ .... p.., into ssriusnce?
of letters chosen from an alphabet where the letter i has cost or
duration £. and it is desired in the cone to minimise the e::psotej
cost. This problem has been studiwc&u a thesis by Richard .Unrc.-s.
We shall use in our analysis a curious notation for real numbers
based on unequal values for various digits, in the ordinary decimal
notation, the range from 0 to 1 is divided into ten equal intervals.
These are labeled with the digits from 0 to 9. Each of chess iatsrvuls
is again subdivided eouallv and again given labels. In the notation
system v/e are now describing, the interval is subdivided into arbitrary
sub-intervals of length \Q. \]( .... J. not necessarily equal but
with JSk. = 1. If a real number between 0 and 1 falls in the interval \.
(closed on the left, open on the right) its first digit is k. All of the
intervals arc subdivided in the same proportions and this determines the
second digit, etc.
This notation system has many of the properties of ordinary binary,
ternary, etc., systems such as unicity of representation, apart from
numbers terminating in an infinite sequence of 0fs or (n-l)'a. However,
it does differ in certain important respects. For enample, if a real
number is chosen at random, then in an ordinary decimal notation we
6
expect one -tenth of each value of digit. In this notation we expect X. of
digit i.
Returning now to the coding problem, we recall that if a set of
channel letters have durations £JP ££, .... £Q the corresponding channel
capacity is C * log WQ where WD is the unique positive real root of
i
Given a set of £ . and the corresponding W_ we define a subdivision of
i °
the unit interval and a corresponding notation for real numbers by the
quantities
v^.y^2 Vrwo"n
Since these are all positive and their sura is unity they form a satisfactory
subdivision.
Now let a set of messages have probabilities Pj > P2 * • • • * Pm and
rot V p. « P, . so P- is the cumulative probability for the first k when
i» i li
the messages are arranged in Order of decreasing probability.
The code to be used is defined as follows. Let P. be expanded in
-£.
the notation defined by the subdivision W 1 out to just enough places
to make the uncertainty due to "digits'-' beyond this point less than p,..
In other words, if Pk is represented in this notation system by the
sequence a^j, ak2. a^. . . . then we carry out the expansion for
to t places where t is chosen to make
7
The coda we are defining represents message r. ky ihs sequence a.
channel symbols congas ponding to the t digits of this eraansian ad r ....
It should first be noted that this does in Tact form a reversible coda. It
satisfies the so-called prefix condition - .10 code word *'s the beginning
of any other code word, lade 3d the cods word corresponding to P,
defines an interval including Pfc and of width .less than p,,. This
interval consequently does not include P,r_. or any earlier P, and the
code word must differ in some "digit" from all preceding code words .
Consequently all code words differ and any sequence of code words is
uniquely decipherable.
We now wish to estimate the expected length of code words, thai: is,
£ Pt £ . From (1) we have
M L cki
t t~I
log W • £ £ >log p;1 > log W • £ f
i*i ahi i«l cki
Multiplying by p^ and summing over all k gives
loS W • E Pk L.k * E pk log p'1 •> log W • E ?k(Lk - £maj
li K n
where L, * F. I is the length of the code word for message h and, on
k inl Chi
the right, we have underestimated by replacing the last term in the sum on
i by its ciajdatim possible value, fimaJ£« the largest duration of any letter.
Now recognising that E p, log p"' 1 H, the entropy or average information
k K K
3
of the message source, and using L ~ p^L^ to represent ths expected
length of a code word this may be written
L!o2W5»H>iL- £mas) log W
or
lofw*L *logV*£max
This is our desired result. Of course the lower bound holds for any
reversible code. The upper bound shows that one can approximate the
ideal lower bound to within J! . In particular, if one is working with
.messages which consist of blocks or n-grams of tent, then H becomes
aEa where E is the entropy par letter for bloebs of length n. a 3 u
increases, H approaches II, the entropy par letter of the message
source.
Dividing the inequalities by n we have, in this case,
n It n , max
In other words, the average code length oer letter of message has a
P.
discrepancy — at most from its ideal value on the basis of n-gram
entropy. This is closely analogous to our previous result with channel
letters of equal duration.
An inequality for a Huffman-type code
A Huffman code (2) for cases of equal cost binary symbols is optimal
in giving the minimum expected length and must therefore have an
expected length less than or equal to H + i since, as shown above.
9
or in (!}, these digit expansion codes which are not necessarily optimal,
satisfy this inequality. Peter Elias suggested the desirability of a direct
proof from the Huffman procedure of this upper -round. In solving this
problem a slightly stronger result was obtained as follows.
Theorem: In a Huffman binary code, the expected word length F
satisfies H «F<H* 1 - 2pmia« Wfcere H is the entropy (in Lit.:;) of. the
set of probabilities and p * is the smallest probability in the set.
Proof: The lower bound is of course well hcown. The upper Lo-.:nd
will be proved by induction. We will assuais it . true for all codes
corresponding to trees with a - 1 branch points and show that if follows
(Fig. 1)
for these with n branchpoints. Consider, then, a Huffman tree v/ith
n branch points and focus attention on the two smallest probabilities.
These occur, by the method of construction, at ends of one fork. Let
these probabilities be p and q with, say, p <s q. If we delete the pq
branches leaving P * p + q at the junction, we have left a Huffman tree
(because of the method cf construction) v/ith a - 1 junctions and to which
our inductive assumption applies. Let uuprimed letters relate to this
tree and primed letters to the enlarged tree. Then we have d * o« .
• - rein
(the minimum probability for the enlarged tree) and since hot:?, p and q
are less then or equal to p^.^ (the into probability for the smaller fcrae),
ir -Pmln- ---^0 the average code lengths £ and ft* and the entropy H and
H' are clearly related as follows:
£i =£ -£-P
H* -Si* pk(|,
Finally by inductive assumption
10
Fig. 1
Fig. 2
11
- rmn
Hence £! «Hf 1 - 2p . * P
« E1 - Pn(E, * I - 2d . -:- 1
since P «s Xp^. Nov/ by the convo:;ity of the curve I.T{::. I-k) we he.vs,
for x thai H(x, 1-s) ^ &s (Fig. 2). Esses, recalling thai p *S q
we have
r.-/p cjA . 9p
If* f/
PH||, §.) >£p
Using this in the above inequality (2) v;e eoacIr.de
£c E' f 1 - 2p
Sines p S8 pj^ . tins completes the induction. The theorem is true
for one branch point, probabilities p and q > p, since in this ce.se
1*1 H(p, q) + I - 2p
using, again, the fact that H{>;. > 2i;.
This result is easily generalised to the ee.se where there are b
available (equal length) letters in the alphabet. In this case it can be
shown that 5 «S g*-- + 1 + dp^^ where d is the number of branches
or. the minimum probability branchpoint of the tree. Thus d is the
remainder if b -- 1 is subtracted from n, the number of messages
12
enough times to give a remainder loss tfeaa or equal to h. The proof
of this result is by the obvious gessn^izstioa of the above proof civ.!
is left to the reader.
Append!::: Existence of the characteristic? equation root
Lexnma: Given f..{w) (5, j » I, 2, . . . 4 d) continuous f-une'dous cf «
in the range a < u « b and in this range f, .(*») > 0, > 0,
f^(a) £^(b) > d, then there euisis W, a <s W « bWi as:!; of
X. > 0, T*SL =s 1, such that
lyw)-5ij| = o
1 *3 ,7
Proof: Consider the {d^uimensional region
(Xj, . . E&, W). v/here ^ > 0,1*
X.. = 1,
i U v/hose joints are
lhts is a
topo.lo3j.Qai image of a schere and its interior. For gJ*Sl^ W
range from a to b, coasider the continuous .mapping
E x f . .(w)
v i 1 l3
ij iJ
if a V, ^ b
a if V. < a
b if Vj > b
Note thai the denominator for Y. does not vanish because of cur
j
assumption that £ fy(X) > 0 and hence the Yj are v/ell defined. Also
the Y. are non -negative and L Y. = I. Finally a «s V c; b. Hence
3 3
13
this maps points (X., W) in E continuously into points (y., V) ia R.
Consequently, by the Brcav/er fixed point theorem .there exists a
point (X.W) which is mapped into itself, that is, a point for which
£x. f. .(W) = X. £ X. f. .(W), W a V. The value of W for the ftepoint
I *■ *J J ij "
clearly is not a or b since these points are moved upward or down-
ward by our assumptions. Hence for the fixpoiut we have
W s w + 1 - T. f„(W)X, or £ f..(W)X. = 1. it follows that for the
ij ^ ij 3
fixpoint
ij lj 13
lyW)- 6..| =0
Let the elements a. . of a matrix be non-negative. Suppose there
is an eigen vector Aj all of whose components are positive, A. > 0,
and the corresponding characteristic value ia Xo- We will show that
for any other characteristic value K. we have (xj ^^0- Let B. be
a characteristic vector for \^ where v/e adjust the length cf this vector
as follows. Choose its length in such a way that A^ - |bJ ^ 0 for all
i and the equality holds for at least one i, say i » h, so that A^ - lBhl •
It is clear that this can be done since with aero length all components of
are less than those of A and increasing continuously, eventually a
first one of the |bJ reaches its corresponding A^ We now have
£Aiaij = XoA. (1)
£ iBj^Hhl lBjl c»)
14
Subtracting these equations for j - h
All terms in ths sum at the left are non-negative and also A. is
definitely positive. It follows that X - |\:| >0.
15
Error Probability Bounds for rloisy Channels
This paper gives a simplified proof akin to that published previ-
ously **3 but leading to tighter bounds on the error probability avid to
a simpler final result. We consider a discrete memoryiess channel
defined by a set of letter transition probabilities p.(j)* Assume a
;-"/?n assignment of input letter probabilities Pf. These night be,
but not necessarily, the sat which gives channel capacity. Let ■
Q. « T F.o. (3) be the output letter probabilities that would result if.
the P. were used for input probabilities.
We consider, ps usnaL a random code ensemble based on the P.
containing M * evu"' messages each with code word of length p.. In the
ensemble of cedes,, M messages* say fefee Integers from 1 to M, are
mappsd independently into the possible input words of length a for
the channel. A .message is mapped into a code word with probability
ecual to that of the code word produced by the product probabilities
generated by the P^. Thus, the various possible codes in the ensemble
have associated probabilities equal to the probability of their occurrence
under this system. We wish to overbound the average error probability
■
for this ensemble of codes with a decoding system to be described,
where the error probabilities of individual codes are weighted with the
probabilities associated with the particular codes.
The mutual information i(u; v) between an input word u and an output
word v is given by
Pr,(v|u)
I(u;v) * log Pr* 1
where Pz-j means probability calculated by the riven latter ess*3sraaes:t
?i and the giver, transitions p^j), (extended independently to blocks c?
length a). I(u? v) may be thought of hare as a number associated with
any input word-output word pair. X{«;v) is the sum of the mutual
. informations between corresponding letters of u and v. Thus if u
consists of the letters Uj, u2, , *.» aa and v of vja v^, ..... v^ thans
because of the independence of channel and letter assignments, we have
IXPr.(v.ju.) *•> # I \
, lv i' x-r (v. u.)
«usv) - log * - E log -prhvT*"
nPr.(v.) i ^rl*Ti'
If we now think of choosing an Anout word u and^on outout word v
according to some joint probability, then X(u; v) becomes a random
variable. In particular, we may cheese an input word u according to
the product probability measure obtained from the probability assign-
ments P., and then an output word v according to the transition proba-
bilities p^j), (independently applied to the letters of u). In the ensemble
of random codesa input words u and noisy received words v will occur
with this joint probability measure.
We define a decoding system for codes in the ensemble as follows.
Any received word v is decoded as that message in the cede in question
whose code word u has the largest I(u„ v). (If several have equal values,
2
take the smallest numbered message from this set.) This might be
called maximum information decoding. It must be remembered, however,
that mutual information is here calculated by the original probability
assignments produced by the It is not necessarily maximum informa-
tion decoding for any particular code or word in a code in the ensemble.
It is actually, however, equivalent to decoding as the most probable
cause of the received word, and therefore is optimal to give small error
probability. This is because all messages have equal a priori probability
P(v|u ) P(v|u-} P(vju ) P(v[u }
so if log — pjyp > log p(vj W»n p{v^ ■ > p(v} > Hence if
message cij is mapped into Uj and m2 into u2 it follows from their
equal prior probability that P{m, |v) > P(m2jv).
V/e may also define a second joint probability measure for (u„v)
pairs as follows. Consider choosing a u word according to the assigned
probabilities and a v word independently according to the assigned
probabilities. This joint probability measure we denote by Pi"2.
We may also consider I(u, v) as a random variable with this set of
probabilities Pr2(u,v) for (u„v) pairs. However, a peculiar point
arises in that some of the P(vju) may be zero. For these (u,v) pairs,
I(u,v) = log F^v I u) is undefined. (It approaches -eo as P(v|u) approaches
zero.) This caused no trouble in the Pr. probability measure since these
(u,v) pairs had zero probability in that case. Here, however, these (u,v)
pairs may have positive probability. We may still, however, consider
the distribution function for I(u, v) in the new Pr2 measure. Thus
Px*2[l3»a] means the sum of probabilities of all (u, v) pairs in this measure
3
for which I(uev) is defined and at least a. In other words, calculate
the distribution function as though there were a certain probability of J
being at -co. The cumulative distribution function from the left would start
not at zero but at a positive value if there were some (u, v) pairs with
P(v|u) m 0.
Lemma: For any a, the average error probability Po for the
described ensemble of codes is bounded by
Pe * Pri[I<al + MPr^Ifca]
£roof: In the ensemble of codes, input words and received versions
of those occur with the probability measure Pr^u.v). Thus, in the
lemma, the term Pr^Ka] can be identified with the probability of a
message resulting in a received word with mutual information as low or
lower than a threshold level o.
The term Prjlfcc] may be interpreted as follows. In the ensemble
of codes, suppose message number 1 occurs. This will give rise to
various received vfs with probability (over the ensemble) given by Pr^v).
This is because message 1 is mapped into all possible u's with the
assigned u probabilities. Now consider the probability that message
number 2 has a mutual information with the received version of message 1
greater than or equal to a. This is given by Pr„[I»o], since in the
ensemble message 2 is mapped independently into the u space. The same
applies, of course, to messages 3. 4, .... M and. in fact, for all mes-
sages other than the actual cause of the received word. The probability
that any message (apart from the actual cause) has a mutual information
4
with the received v exceeding a is given exactly by
1 - (l-Pr^X^a])^1"1 «(M-1) Pr2[l^a]
<MPr2fr*a]
■
Thus the probability that either the actual message has a mutual informa-
tion less than a or that some other message has a greater than a
mutual information with the received v is bounded by
(The probability of either or both of two events can always be bounded by
the sum of their individual probabilities, whether or uot the events are
independent.) If neither of these events occurs, the decoding will be
correct since the mutual information with the actual cause is greater than
a and that with all other messages is less than o . Thus the error proba-
bility in the ensemble is bounded by
P£ ^ Prj[l«a] + MPr2[l^a]
As an example of a random code ensemble, consider the following
situation. Suppose there are tv/o input words and two output words with
the transition probabilities shown in Fig. 1.
5
FIG. 1
The arbitrary assignments of probability . 6 and . 4 have been made to
Ul ^d U2« and this resul{:s in .7 X .6 + . 5 X .4 = ,62 for Q(Vj) and
.38 for Q(V2). Suppose there are tv/o messages, 1 and 2. The random
ensemble of codes then consists of four codes.
code 1 Pr(code 1) = .6 * .36
coding decoding
code ?. Pr(code 2) = . 6 X . 4 ~ .24
6
code 3 Pr(code 3) * ,6X.4= .24
1 -U.
2 -U,
Vj -> 2
code 4 Pr(code 4) « .4 = .16
1 -U.
2 - U,
V2-1
The distribution of mutual information under the two measures
Prj and Pr2 is given by the following table:
Pr!
Pr
2
Kbits)
Dlvl
. 6 X . 7 = . 42
.6 X .62 = .372
U1V2
.18
.228
-.340
U2V1
" .20
.31
-.308
U2V2
.20
.19
.397
The functions pj and 1 - p2, together with the sum
pj + (M-l)(l-p2) = Pj f 1 - p2 (since M = 2) are shown in Fig. 2.
7
i
J /' ^
FIG. 2
This example, of course, uses a ridiculously smell M and small
number of input and output words in order to keep the number of codes
and other complexities down. According to the theorem, the error
probability will not exceed the curve px(x) + 1 - p2(x) at any point. The
best choice of x is clearly one between 34 and . 177 for which the
sum curve is .942. Thus we may assert ths? for the random ensemble
Pfi « .942. Actually, if the messages are equally likely, the error
probability is given exactly by
Pe = .36(.5) + .24(.4) + .24(.4) i .16(.5)
» . 452
An optimal code for two messages into this channel clearly maps them
8
into the two input words and gives an error probability with optimal
decoding of .4. The bound of the theorem does not become very useful
or significant until the number of messages and possible input words
is reasonably large.
We now wish to express the bound of the lemma in terms of the
assigned probabilities ?i and the transition probabilities p^j). As
noted above, I is the sum of n independent random variables (the
mutual informations between corresponding letters cf u and v). This
is true both for the Pr^ and Pr2 probability measures. Thus each
term of our bound relates to the problem of estimating the tail of a
distribution which is the sum of n identically distributed random
variables. We may conveniently estimate such tails by the Chernofx
bound involving the logarithm of the moment generating function, say
fj(s), of the individual random variables. Chernoff's bound states that
if Xn is the sum of n such random variables, then
Pr[Xn > mx'(s)] « enMs)-s^'(s}) s > 0
In our case the log moment generating function for Pr^, which will
be called iij(s), is given by
9
fi^s) = log Y, Pi?i® e
t,2
s log
~ log 2^ pi
Pi<J)
With regard to the Pr2 measure and estimation of Prji^aj, it is
still possible to use the Chernoff bound for s > 0, even though I has
"positive probability of being at -co." To see this, note that the meanest
generating function vz(s) for the Pr2 measure is a well-defined function
for s > 0, namely.
Furthermore, the generalised Chebycheff inequality with the monotone
si
increasing function e still holds for positive s.
s > 0
e5aPr(I*a] <S[e5i]
Pr[I>aj <
s > 0
10
In particular, setting a = fi^(s) where |jl2(s) m log v2(s), (this is the best
choice to give a good bound), we obtain
Ms)-S|i»(s)
Pi^<s>3 « = e 2 2
Note also that
v pi(j)S
a Hjts-l) 0 < S ^ 1
Thus the two generating functions have, in the common range of their
validity, a very simple functional relationship. It follows that |<£(s3 =
ji'jCs-l). If we wish, in using the Chernoff bound, to place the cut-off
point for the tail, that is, a, at the same point, v/e must use s^ and s,
for jaj and \l2 related by
a = nii'^Sj) t. n^2(s2)
This is achieved by making s2 « Sj+1, since then n2(s2) ■ p^(s.+ l) ~
n'^Sj+l-l) = ^(Sj). This is a unique solution, if we except the rather
degenerate case where I is constant, since it is easily shown that in
all other cases y*(s) is positive. Using s2 and Sj related in this
manner in the Chernoff bounds, we have
U
Pr^ten^s)] « e 2 1 1 21
nCjt^SjHSj+lJu^Sj))
s2 = + 1 > 0
Thus both bounds are now expressed in terms of and its derivative
with one parameter, Sy which must be in the range -1 < Sj < 0. Our
error probability P is now bounded by (writing = M and s for s.,
since we no longer need the subscript)
n^sj-sn^s}) ^ n(»1(»HM'l}|^(tr)'
? + e e
Pe < e x 1 + e™* e k 1 ~1< s « 0
This bound holds for any s in the allowed range. We wish to choose s
to roughly minimize the bound. This is done conveniently by equating
the exponents for the two terms, since the first is monotone increasing
in the range (its derivative is -nsfx|(s)} while the other is monotone
decreasing (its derivative is -n(s+ l)^|(s)). Thus we set
Hj(s) - su'jfs) = U + ^(s) - (s+1) ^(s)
With s chosen to satisfy this, the two terms are equal and therefore
12
P reduces to twice the first term. Thus
e
if Rs ^(s) -i < s < 0
nCfi^sJ-Si^Cs))
Pe * 2e
If ,(-1) < E . AW . E P. PM log#. there will ^S a uni.ue s
in the allowed range satisfying It * ^(s). This may be seen by noting
that fi.s,(s) is a continuous monotone increasing function of s as s
ranges from -1 to 0. Furthermore, if there are no p.(j) * G^-l) a? 0.
This follows from the convexity property of the logarithm,
p-li)
(L Pj log 2. ^ log £ p. st for £ P£* l) ; we have ^{-1} «S log Z P Q ™-
log E P.- pAj) ■ log 1. Hence, in this case, for each E from C to the mean
mutual information related to the probability assignment P. there is a
unique s.
If there are some (i, j) pairs with p.(j) = 0, it is possible to have
ji'(s) approach RQ > 0 as s approaches -1. This happens, for example,
in the channel Fig. 3,
FIG. 3
13
for which RQ ~ log-^j-s log 1.2 > 0. In such a case, the bound as written
T
above applies only between RQ and fi!(0)c the average mutual information
with the given probability assignment. We may, however, extend the
bound to lower rates by an argument similar to that in (1). For rates R
satisfying 0 « R « R choose for the a in the lemma a value less than
° pAi)
nImin* where lmin is ths smanes* -g-Hamong (i,j) pairs for which
PjU) > °. fcnat Kot including the w-co» cases). We then have that
Prjft^a] - 0, since all cases with non-sero probability in this probability
measure give I values at least nX . . Also, Prjl *al « | £ PQ.\n
where SF is the set of (i, j) pairs for which 0^.(3) $ 0 and hence the
mutual information is finite. In fact, / £' P.Q, \n is the probs
that all corresponding letter pairs of u and v have finite mutual
information. If any pair fails, the u could not have been the true
cause of v^ since that letter would have involved a transition of scero
probability. It follows that
P ^WEp
Vs
n/k+log E P,qA
V SF V
Thus, in this region, the coefficient of n in the exponent is a linear
function of the rate R of unit slope and intercept log £ P.Q.. It is
14
readily seen that this straight line is tangent to the curve of the previous
bound at the value R * RQ. However the coefficient in F has improved
from 2 to 1.
Of course, in the ensemble of codes there must exist particular
codes satisfying these same inequalities for error probability, since
there is always one member of an ensemble at least as good as the
average. Furthermore, if one were to choose samples from the ensemble
of codes with their corresponding probabilities, then with probability at
l3ast j -_L a sample will have an error probability less than or equal to
k
k ¥ for any k > 0. For example, with probability at least .9, the sample
would have an error probability less than or equal to 10 PQ. This is
because Pe is non-negative for each code and if the probability of
exceeding k F were more than ~ the average would exceed TQ, a
contradiction.
Thus a code could be generated by a Monte Carlo process or by use
of a book of random numbers with high probability of not exceeding the
error probability bounds excessively.
15
Uniformly good codes
The bounds above refer, of course,, to average error probability
over the different messages when all messages are used with equal
probability. It is also of interest to consider uniformly good cedes
for which each message has a low error probability. From a code of
the first type it is possible to construct a uniformly good code with
slightly lower rate and poorer error probability. ( ^ In fact, if we have
a code with Mj messages and error probability less than or equal to
Pel (the messages used with equal probability), then at least half of the
messages must have individual error probabilities less than or equal
to 2?el« This is the same combinatorial principle as used above. Thus
we can find a code with ^ (or Hp" ^ M is odd) messages and a
uniform error probability bound of 2Pgl. This corresponds to a rate
1 i
of essentially R - ~ log 2 and a reliability of E - ~ log 2, where R
and E are those for the given code. In other words, the same R and
E curves apply if displaced in both coordinates by ~ log 2, a quantity
which rapidly approaches zero as the code length n increases.
Such uniformly good codes have the desirable feature of preserving
the same bound on error probability even if the prior probabilities of
different messages are changed. Indeed, they may be used if such
message probabilities were entirely unknown or felt to be meaningless
or non-existent in a particular situation.
16
Best bounds under variation of the
The above bounds were deduced on the basis of an arbitrary
assignment P. of input letter probabilities. To obtain the strongest
results from these bounds one may, for any particular ft, vary the
P4 and attempt to find the set which gives the minimum, bound on error
probability. Another way of looking at this is that the E(E) bounding
curves are found for all possible assignments and the envelope of
these is used. It may be readily shown that if the channel has the
"uniform input0 property, then the best assignment is for ail input
letters to have equal assigned probability. A channel has the uniform
input property if the output letters can be partitioned into a number of
subsets Sj, S2, . . such that each output letter in any subset 5^ has
the same set of transition probabilities coming into it and each input
letter has the same set of transition probabilities going into A
simple example is the erasure channel if both letters have the same
probability of being erased.
17
Behavior near channel capacity
■
The first-order behavior of E, the coefficient of n in the error
probability exponent, for rates near channel capacity may be found by a
power series expansion of E(s) and R(s) about the point s - 0. Thus
Z
E(s) * E(0) + sE«(0) -f-^-Ew(O) + . . .
2
* 0 +. b($vl»(s)\ + ~- /|is(s)+sp«(sft * . . .
R(s) * R(0) f sR«(0) * . . .
« C * s>iB(0}
Eliminating s between these two relations we obtain
(R-C)2 « s V(0))2
«= 2 E^CO)
Thus, near channel capacity, the ER curve is approximately parabolic
with second derivative at C equal to ^»|qJ • K is readily shown that
jtK(0) is the variance of mutual information, and this approximation is
16
related to a central limit theorem normal approximation to the distribu-
tion of mutual information near its mean. The approximate bound here
near channel capacity is the same as that in (1), the two curves
"osculating™ at channel capacity and diverging appreciably only at
lower rates.
19
(1) C. E. Shannon, °Certcun KesulJs in Coding Theory for Noisy Channel
Information and Control, Vol. I, Mo. 1
20
RELIABLE MACHINES FROM UNRELIABLE COMPONENTS
C.E. Shannon
These notes, taken by W. W. Peterson, cover the first
five lectures in the Seminar on Information Theory offered by
C.E. Shannon at M.I.T. , Spring term 1956. The subject mat-
ter is principally VonNeuman's "Probability Logics".
March, 1956
RELIABLE MACHINES FROM UNRELIABLE COMPONENTS
Bibliography:
VonNeuman, Probabilistic Logics, Notes by R.S. Pierce on
lectures given at C.I.T. , 1952. (To appear in Shannon
and McCarthy, Automata Studies, Princeton University
Press, 1956.)
Tsien, Engineering Cybernetics, McGraw, Hill, 1954.
The last chapter of this book is a condensed version
of part of VonNeuman' s paper.
Moskowitz and McLean, Some Reliability Aspects of System
Design, Rome Air Development Center Report. RADC
TN-55-4.
Shannon and Moore, Reliable Circuits Using Less Reliable
Relays, Unpublished Bell Laboratories Report. (To
appear in Journal of Franklin Institute sometime in
1956. )
1. 1 Introduction
Even "near Perfect" elements may not be adequate for extremely
complicated machines, or for machines whose failure might result in a
catastrophe. Consider a complex machine made up of 10 components each
of which with a probability of failure of 10-6 in any minute. This machine
would be expected to fail once each minute, even though each particular com-
ponent was expected to fail only once in ten years.
In case men's lives depend upon the successful operation of a
machine, it is difficult to decide on a satisfactorily low probability of failure,
and in particular, it may not be adequate to have men's fates depend upon the
successful operation of single components as good as they may be.
The following methods may be used to increase reliability:
1) Complete Redesign
For example, a digital computer may be used to
replace an analog computer in order to gain accuracy.
2) Improve Components
For example, most relays now have double con-
tacts and are several orders of magnitude more re-
liable than single contact relays.
3) Use Error Detecting Systems
For example, numbers may be represented in a
computer or data transmission system in the 2 of 5
code, in which numbers are represented by all
arrangements of two ones and three zeros in
five bit positions on a paper tape or other medi-
um. A component failure would probably result
in a character which had too many or too few ones
and circuits which check the validity of the char-
acters would detect most errors. Another example
is the biquinary code, used in the arithmetic unit of
the Bell Laboratories Relay Computer. In this code,
numbers are represented by seven bits according to
the following scheme:
In the Bell Laboratories Relay Computer, error
detection was used to enable error correction by having
the machine check the validity of the coded numbers
after each operation and repeat any operation which re-
sulted in erroneous results.
4. Error Correction
For example, though the individual neurons in the
brain fail, the brain usually continues to operate with-
out a serious failure for many years.
The fourth method of improving reliability is the subject of this part
of the course.
As an indication of the type of analysis that will be made, consider
an unreliable machine which has an input and an output which may be any one
of many symbols (for example a digital computer wht^re output is a ten digit
number):
0
1
2
3
4
01 10000
01 01000
01 00100
01 00010
01 00001
5
6
7
8
9
10 10000
10 01000
10 00100
10 00010
10 00001
i NPOT
qui- ° I T
Fig. 1. 1
-3-
Also consider a perfect majority device (i.e. the majority device itself
never makes errors),
— ■> —
M
1
— * —
— ? —
— * —
Fig. 1.2
which has three inputs. There is no output unless two of the inputs agree,
in which case the common symbol is the output. Now consider three copies
of the original machine with their inputs taken from the same source and
their outputs connected to the majority device:
INPUT,
^ — OUTPUT"
If p=l-q is the probability that the output of one of the devices is
correct, if the probabilities are independent, and if the probability that two
of the three erroneously agree is negligible, the probability that the device
shown in figure 1. 3 will give the correct output is
P =p3 +3pCq= (JL-q)3 + 3(l-q)'q
= l-3q^ + 2q3, and
.2„_
(1. 1)
Q = 1-P = 3q2 -2q3
If q is small, Q may be much smaller, while if q is large, Q may
be considerably larger. For example, if q=10"^, Q=3xl0~l2, while if
q=0. 7, Q=0.764. If q = l- 10 then Q = 1 -3x10" 12
Frequently a complex device is made up of many devices connect-
ed in cascade:
— >—
Fig. 1.4
-4-
Instead of the system considered in Figure 1.3, consider the following
system:
Fig. 1.5
This system is at least as good as that shown in Fig. 1.3,
because that system requires that all the blocks in two rows function
correctly. This system will give the correct result in that case and
in many others.
If the error probability for each of the four parts of the
machine shown in figure 1.4 is 10" , the error probability for the
four in cascade is q=l(l-10~3)4=4xl0~3. The majority organ would
correct this to Q=3q2-2q =48xl0"6. Using the scheme of figure 1. 5
for each stage q = 10"3, and hence Q=3xl0"°. Four stages cascaded
will give an overall probability of error l-(103xl0" )4 = 12xl0"^, which
is one fourth that obtained by the other system. If the probability for
each part of the machine is taken as 0. 2 instead of 10 ~3, the resulting
error probabilities for the systems shown in figures 1. 3 and 1. 5 are
. 51 and. 38 respectively.
The poor features of this system compared to 1. 3 are, 1) the
cost of the majority devices, and 2) that in practice majority devices
are not perfect, and they introduce errors also. If the machine is broken
down into small enough parts the majority devices may introduce more
errors than they correct.
1. 2 VonNeuman's Probabilistic Logics
The basic scheme used by VonNeuman is to design the desired
automaton from some sort of idealized components and then to describe a
way of transforming this into a reliable automaton built from unreliable
components.
-5-
The ideal components have a number of inputs and one output,
as in the following diagram:
V
Fig. 1.6
The output variable y and all the input variables take on only the values
0 and 1. The output is a function of the input variables,
y=f(Xl, X2, X2, X3, X4)
but it is delayed by a time § . In the following analysis all elements
will be assumed to have the same delay, and this will be used as the
unit of time.
In designing an automaton from these elements it is assumed
that any output can be branched and connected to any number of inputs,
but that two outputs are never connected together.
These ideal elements might be thought of as idealized neurons
or computer logical circuits, but we will consider them simply as mathe-
matical model without any particular interpretation.
The following special type of ideal element is particularly use-
ful:
exitatory inputs
inhibitory inputs
Fig. 1.7
The device may have any number of exitatory inputs and any number of in-
hibitatory inputs. The device has a one as output only when
Ne-Ni > h,
-6-
where Ne is the number of exitatory inputs receiving l's and is the
number of inhibitory inputs receiving l's. A bus for supplying constant
l's and one for constant zeros will be assumed.
Devices like these can be analyzed by the propositional calculus^,
defined as combinations of "and", "or", and "not" operations on variables
and polynomials made from these variables by these three operations. These
operations are defined as follows:
Name
"and"
Symbol
Xl • X2
'or'
Xl + X2
TRUTH TABLE
X.
0 1
0 0
1 j 0 1
X-.
0 1
0 1
1 1
'not"
xl
1
0
xl
0
1
One noteworthy theorem from the propositional calculus states that
any polynomial can be written uniquely in the following canonical form:
1
p (X1, X2, Xn> = XZ7 ai...i Xl X2 Xn
V V l3v.'''"° 1 n
1^ x2 1
c n
(1. 3)
where X° = X and X1 = X . The coeficients ai ^ . . . . i are essentially the
12 n
truth table for p, and the proof consists essentially of noting that if the truth
table'of a polynomial p is used as coeficients, the canonical form will agree
with p.
1. Couturat, The Algebra of Logic, Paris, 1905
Birkoff & Maclane, A Survey of Modern Algebra , MacMillan, 1953.
-7-
Ideal elements which do the "and", "or, and "not" operations
can be formed as follows:
5J>
'and'
'or'
"not"
Fig. 1.8
Of these a device with arbitrary function as output can be built. This can
be proved by an induction on the number of "and", "or", and "not" opera-
tions in the expression. For n=l, the function can be formed with one of
the basic elements shown in figure 1. 8. Assuming that the statement is
true for all functions containing no more than n operations, the device for
a function with n+ 1 operations can be constructed as follows: consider the
(n + l)st operation. Its operand(s) certainly contain no more than n opera-
tions, and therefore, devices can be constructed which correspond to them.
The outputs from these devices can be combined using the basic element
corresponding to the (n+ l)st operation to give the required device.
Any function can be generated in this manner, but there will be
a delay. For any given function there is a minimum delay. The same func>
tion can be obtained with arbitrary delay greater than the same function by
using any number of "and" circuits as unit delay elements.
Fig. 1.9 - Delay Element
In order to simplify the mathematical analysis, we wish to re-
duce the number of types of elements to a minimum. By using DeMorgan's
Theorem:
(x+y)' =x'y\ or x+y = (x'y')1
the "or" can be obtained from "and" and "not" elements. Thus
)0.
is "or" with 3 units delay
is "and" with 3 units delay
Fig. 1. 10
is "not" with 3 units delay
-8-
and these could be used as basic elements. Similarly, "and" can be
obtained from "or" and "not".
The "not" operation cannot be obtained from "or" and "and" ,
however. The "and" and "or" operations are monotone, i.e. increas-
ing one of the arguments from 0 to 1 never causes the result to decrease
from 1 to 0. Any combination of monotone operations will result in a
monotone function. Hence "not", which is not monotone, cannot be obtained
from any combination of "and" and "or" operations.
There is another way of organizing an automaton which does
make "and" and "or" sufficient for obtaining all polynomials. It is the
"double line trick" in which each variable is represented by two lines.
A one is represented by a 1 on the first line of the pair and an 0 on the
second, while a zero is represented by the opposite. With this conven-
tion, the "and", "or", and "not" can be obtained from "and" and "or"
elements as follows:
\ y
"not" J><r
■and'
The "or" can bs obtained by using DeMorgan's Theorem, ie. , by twisting
both input lines and the output line. (I turns out that this is equivalent to
interchanging the basic "and" and "or" elements in the diagram of the "and'
circuit. ) Note that the "not" circuit needs a delay of one unit to make it
have the same delay as the "and" and "or".
It was discovered by Scheffer that there are single functions
from which all these, "and", "or", and "not" can be obtained. One is the
Scheffer stroke function;
(x|y) = (xy)' = x' + y1
1
0
1
0
(1-4)
In terms of it,
x1 = (x|x)
x- y = (x y)
(xy)
x+y = (x x) (y y)
(1
The stroke function can be represented by an element of the
following type:
Fig. 1. 12
and "and1
"or" and "not" circuits can be built from Scheffer stroke
elements according tothe above formulas. Note, however, that the
"not" circuit requires one stroke function and hence only one unit de -
lay,, while the "and" and "or" circuits require 2 stroke functions cas -
caded, and hence two units of delay. This time the delay cannot be
equalized. The stroke function is anti -monotone. In a device made
from stroke functions; any points removed from the input by one stroke
element will be anti -monotone. Any points two levels deep are mono-
tone, etc. Thus the "not" which is anti-monotone cannot be obtained
at the same level, and hence time delay, as the "and" and "or" which
are monotone.
Since "and" and "or" can be obtained at the same level, we
can use the double -line trick to obtain "and", "or", and "not" in
of stroke elements.
Another element of interest is the "majority organ":
Fig. 1. 13
It is monotone, but the "and" and "or" can be obtained from it as fol-
lows:
x+r
Fig. 1. 14
and hence the majority organ is universal when used with the double
lina trick.
-10-
From the elements we have discussed we can build black boxes
of the following kind, with any given set of propositional functions
fi (x
r
. xn) i = l, 2, . .
m,
,x-
2' * *
x2, •
(x, ,x-
f2 (xr
xn>
.xn)
•xn>
fm ^xl' x2' • • ' - xn)
1. 15
and we can line up the outputs by using delay elements. The notation
can be shortened by writing Xfor the vector (xj.X2.x3, . . . . xn) and F
(X) for the vector function (fpf2 fn).
A more general type of machine has outputs which depend
not only on the input but also in some way upon the previous history of
the device. One very general model of such a device is the "finite state
transducer" which is a satisfactory representation of a digital computer,
for example. At each time interval i it is given an input (vector) X , it
has a state (vector) S1 which can assume a finite number of possible
values, and produces an output (vector) Y :
m
1 —
Fig. 1. 16
They are related by the following equations:
si+1=f<s\ x1)
Y* = g(s\ X1)
The relationship between finite state transducers and
devices made of the basic elements is made clear by the following
two theorems:
(1.
THEOREM Any device made by combining a finite number of basic
elements is a finite state transducer.
If the output of the j element at time i is denoted by sj, then
certainly sj is a function of sj".1 and the input, and can be interpreted as
the components state .vector S1. Then the output (vector) Y is certainly
a function of S1 and X1, since the outputs must come either from an element
or directly from the input.
THEOREM Given the equations for a finite state transducer, such a trans-
ducer can be built of basic "and", "or" and "not" elements, (or any other set
of universal components) except that the interval of time between inputs and
between outputs will be some multiple of the unit delay, and the outputs Y
may be delayed by some multiple of the unit delay.
Suppose there are k states. They can be represented by the
binary numbers from zero' to k-1, and the binary digits of these numbers can
be used as the components of the state vector S . Then the original given
equations for the transducer become equations in binary variables:
Si+1=f(Si, X1)
Y* =qt»\ X1)
Black boxes of the type shown in Fig. 1. 15 can be made corresponding to
each of these equations, but they will have delays. Suppose each output
from the first is delayed dj units, and each output from the second d2 units.
Then the finite state transducer is obtained by connecting them as follows:
Fig. 1. 17
If an input is entered at times 0,dj, 2dj, 3(1^, etc. , the inputs will
synchronize with the state variables coming out of the f box to satisfy
the first equation, and the second box will produce the outputs according
to the second equation, at times d^.d^ + d^, etc.
-12-
(Actually, the machine could simultaneously process dj input
sequences starting at times 0, 1, 2, . . . d, -1 respectively and produce the
dj output sequences similarly meshed).
Problems:
1. a. Prove that it is possible to build a device with two inputs and
one output which produces the sum of the input binary numbers
for numbers of arbitrary length.
b. Prove that this is not possible for multiplication. (It is also
not possible for square root)
2. Design a device from an infinite number of "and", ''or", and
"not" elements which is equivalent to a universal Turing
machine.
The following are examples of types of machines which are
not finite state transducers:
1) A device which has an infinite number of states, for
example, a Turing machine with its infinite tape.
2) A device in which continuous variables occur, and hence
there are again an infinite number of states. An analog computer is an
example. In cases of this type the variables usually have strong conti-
nuity conditions and can be approximated to any desired degree by quantized
variables. Hence the machine can often be approximated by a finite state
transducer. (For example, a digital differential analyzer approximates
an analog differential analyzer.)
3) A device which contains a random element. For example,
a computer which makes unpredictable errors. A device with a random
element might be very useful, for example m playing games of strategy
in which a mixed strategy is called for.
Now we shall consider automata constructed of basic elements
which sometimes fail, with the failures occurring according to some prob-
ability measure. We could assume a completely general probability meas-
ure on the space of all parts of the machine, i. e. , we could include all sorts
of correlation. Some correlation certainly occurs in real machines. Vacuum
tube failures, for example, are frequently the result of the application of im-
proper voltages. Since the voltages are usually applied to many tubes, a num-
ber of failures may result from one occurrance of improper voltage, and hence
correlation appears among the failures. Likewise, in a relay machine most
failures result from dust. The fact that one relay fails is an indication that
dust is present and other relays are likely to fail also.
-13-
To assume a completely general probability measure would
make the problem so difficult mathematically that we could hardly ex-
pect to accomplish anything.
We shall assume that the errors which occur in the different
basic elements are independent. We shall use majority organs as basic
elements and assume that the probability of erroneous output is re-
gardless of the number of l's at the input. (A physical realization of the
majority organ might not have this property. It might be more reliable
when the inputs are zeros than when they are ones, or it might be more
reliable when all inputs are alike than with two ones and a zero or vice
versa). (A possible generalization of this would be to assume that the
probability of failure of any element is less than j£ regardless of the
number of l's at the input and regardless of the state of other parts of
the machine).
If the output from the machine appears on one line, the prob-
ability of error of the output is at least 6 (except in the trivial cases in
which it comes directly from the input or from a zero or one bus) simply
because the output must come from a majority organ which has probabil-
ity of error of CT -
If xjj, n^, n^., are upper bounds on the error probabilities
of the three inputs to a majority organ, then the probability of error for
the output of the majority organ satisfies the inequality
n*^ ^i"*--*]* (1.7a)
This gives an absolute upper bound on the error probability, regardless
of any correlations which may exist. It does not offer any hope of im-
provement since it never promises any decrease in error probability at
the output of the majority organ.
If we assume, 1) that these probabilities are independent, and,
2) that the three inputs agree if they are correct, a stronger result can
be obtained. The probability that at least two of the inputs are incorrect
is then
e;I)ll2 (1^3)+I)l ^3 ^-^ + T\Z ^3(1-^1) + I}l ^2 ^3
= ^l Vll x>3+ ^2 ^3 ' 2rU 12 ^3- (1- 7b>
The probability of an error in the output is the probability that either
1) at least two inputs are incorrect, and the majority organ works prop-
erly, or 2) at least two inputs are correct and the majority organ makes
an error, (but not both, ) hence,
■14-
n* = 9(1- £ ) + €(1-0).
If ^1=^2 =^3 =^ '
2 3
9 = 3n -2^ , and
n* = e + (1-2.6 )(3n2-2n3)
Now consider a machine which makes errors:
(1.8)
(1.9)
—
—
— * —
^ —
Make three identical copies and connect the outputs to a majority organ.
Fig. 1. 18
Errors in the outputs can now be considered independent because they
occur in different machines. Also, the outputs will agree if they are
correct. Hence the above formula applies. But if this is to work, the
output error probability n* must be less than the input error probability
n.
From equation (1. 9) it can be shown that n* considered as a
function of n passes through the point (1/2, 1/2). It has zero slope when
n = 0 or 1. The graph looks something like this:
-15-
The curve is tangent to the diagonal at the center for £- = 1/6. In
order for rf* to be less than rj. the curve must lie below the diagonal
for < 1/2, and hence C must be less than 1/6. The curve for
£ =1/12 crosses the diagonal at n=l/2 and also at i^=0. 15 and 0. 85
approximately. For n K 0. 15, n* ^ n and the error probability is
increased. For 0. 15 < n_ < 1/2, on the other hand, n* < n and the
error probability is decreased. In either case iterating the procedure
makes the error probability approach 0. 15. Thus this crossing acts
as a kind of stable point.
Now let us consider in more detail the design of a machine.
Consider a machine (designed on the assumption of error free com-
ponents) which has no feedback in it. It would be of the type shown in
Figure 1. 15.
Theorem Given an error -free designQ with no feedback we can con-
struct an equivalent machine Q* (with added delay). Each element of
Q* has error probability £ , but the whole machine has error prob-
ability less than n^ £ ). n,( £ ) is independent of machine complexity
and approaches zero as £ approaches zero.
The proof is by an induction on the depth n of the machine.
The theorem is obviously true for n=0, since that would mean all out-
puts come directly from inputs or zero or one buses. Assume that it
is true for n=k, and consider a machine of depth k+ 1. Since there is
no feedback, all the outputs from the majority organs at the greatest
depth must be connected only to outputs of the machine. If these ele-
ments are removed, the rest of the machine will have depth k:
equivalent
to
Fig. 1.20
Now by the induction hypothesis we can build this machine
of depth k with error probability less than iy( £ ). Build three copies
of it. Then connect each set of these corresponding outputs to a ma-
jority organ. Finally connect these outputs to the k+ 1 layer of majority
organs:
-16-
Fig. 1.21
The error probability at the outputs of the devices of depth k
is less than n( ) by the induction hypothesis. Also, the three boxes
are independent, and the outputs on corresponding lines, which go to a
correcting majority organ, will agree if they are correct. Therefore,
a bound on probability of error at the output of the correcting organs is
given by equation (1. 9):
n* = e + d-2 e )(3ij2-2h3)
The probabilities at the inputs to the computing organs are less than n* ,
but they are not necessarily independent, nor need they agree if correct.
Therefore, we use equation ( 1. 7a):
P€ 1= 3n*+ e (1.1
Combining these equations, we find
Pe £ 4^ +3(1-2 £ )(3n2-2n3)
(1.1
-17-
for the probability of error at the output of the device as shown in
figure 1. 21. In order to complete the proof we have to assure that
P ^ if, and it is this requirement that defines the function n, ( £- )
^ The curve of equation (1. 11) is similar to the curve of
equation (1.9), except that
o > y\ \
Fig. 1.22
the critical £ at which the curve becomes tangent to the diagonal is
approximately . 0073. Clearly, if is any number such that
nQ ^ jS < 1/2, (where nQ is the point where the curve crosses the
diagonal, then whenever n < p , Pv will be less than 0 also. There-
fore the function i^( £ ) can be any function which satisfies the inequal-
ity J\0 4 n( £ ) £1/2 for all £ . In particular n( £ ) = Y\Q is accept-
able.
Note that the fact that there is no feedback plays a part in
this proof.
One variation on this system would be to iterate this trip-
licating a number of times at each level of depth of the device. This
will permit using majority organs which have error probabilities greater
than . 0073; it is possible to have C at least as large as . 125, and prob-
ably very near 1/6.
Adding one level of depth triplicates all previous equipment
and adds some, so that the redesigned machine contains much more
than 3n times the amount of equipment involved in the first level of depth.
Even for modest values of n, this makes a fantastically large machine.
Now we will consider another system, which is less sensitive
to errors on individual lines. It is called "multiplexing of lines". With
this system, one line in the original device is represented by a "bundle"
of many lines, most of which will carry a one when the corresponding line
in the original machine carries a one, and most would carry a zero when
-18-
the corresponding line carries a zero. The threshold level will be
denoted by £ : if the fraction of lines excited in a bundle is less than
^ , the bundle will be interpreted as carrying a zero. If the fraction
is greater than 1- ^ , it. will be interpreted as a 1. If it is between $
and 1- S> the result will be considered as uncertain. This "fiduciary
level" & , does not enter into the machine, but only into the analysis
of the machine.
A majority organ for bundles can be made as follows:
Fig. 1.23
If all the lines in each of two bundles are excited, then except for ma-
jority function errors, all outputs will be excited. Similarly, it works
for all zeros on two bundles, so that the device works roughly as it
should.
Now suppose fractions a, b, and c respectively of the three
inputs are in error. Neglect errors in the majority organs. Also sup-
pose that the first two bundles carry l's, while the third carries a zero.
The largest number of errors in the output would be achieved by having
all the zeros in the first bundle matched with ones of the second and
zeros of the third. Similarly all the zeros of the second bundle should
be matched with ones of the first and zeros of the third. This would
make a fraction a+b of the outputs wrong. The same would apply if the
first two were zeros and the third a one, by the duality between zero
and one in the majority organ.
If all three inputs are ones, then there will be the most errors
in the output if every error in the output is caused by two erroneous input
lines. The number of errors in the output bundle certainly cannot exceed
half the total number of erroneous lines in all three bundles at the input.
Thus the fraction d of errors in the output is
d < 1/2 (a+b+c).
(1.
-19-
(This can almost be achieved if a, b, and c are the sides of a triangle.
Otherwise, d is less than the sum of the smallest two of a,b, c. )
If a=b, =c, the bound on errors at the output is 2a for the
first case (not all three inputs the same) and 3/2a for the second (all
three inputs the same). The bound we have on error probability at the
output of the organ (2a or 3/2a) is thus greater than the bound a on
error probability at the input.
The error probability might decrease if we consider an av-
erage situation instead of the worst possible situation. Consider the
case in which all three bundles are carrying the same symbol (0 or 1),
and take the average over all permutations of all the erroneous lines in
each of the input bundles. Then the probability of at least two erroneous
inputs to any given majority element is
d = ab(l-c)+ac(l-b) + bc(l-a) + abc
= ab + bc+ ca-2abc (1.13)
and this will also be the mean fraction of lines excited in the output
(Assuming £ = 0). In any particular case some variation from this
would be expected.
If a=b=c,
d=3a2-2a3, (1. 14)
the same equation which occurred before (Equations (1. 1) and (1. 9)
with £ =0), but for a different reason.
VonNeuman proposed the following as a system for restor-
ing the level of the fraction of lines excited in a bundle (to mean 0 or 1).
Each line in a bundle is to be split three ways, to get 3n lines. Then
these would be put through a "random permutation" black box. The out-
puts would be connected to majority organs:
j
Fig. 1.24
-20-
This black box might be wired so that each input is connected to one
and only one output according to a table of random numbers. The idea
is to achieve the effect of independence of the inputs to any one major-
ity organ so that formula (1. 13) applies. There is no rigorous proof
that this can be done, but it seems very plausable.
The same analysis can be done with Scheffer stroke organs.
It could be done indirectly by noting that a majority organ can be con-
structed from any set of universal organs, and hence all results which
hold for majority organs hold also for any other set of universal organs.
The er ror probability ^ would have to be that for the constructed ma-
jority organ, of course, rather than that for the basic elements them-
selves. The analysis for the stroke organs is simple and interesting
enough to do in detail.
The stroke function for a bundle can be constructed as fol-
lows:
Fig. 1.25
If both inputs are supposed to be on, the result should be zero and an
error will occur if either input to an organ is off. Therefore the num-
ber of errors in the output can ba as great as the sum of the number of
errors on both inputs, but it cannot exceed this number. Therefore the
fraction of errors c is bounded by a+b. If the first bundle has a 1 and
the second a zero, the answer is supposed to be one and would become
zero only with an error in the second line, so the fraction of errors in
the output cannot exceed b. Similarly, with a zero on the first bundle
and a one on the second, the fraction of errors in the output is no greater
than a. When both inputs are zeros, only two erronsous inputs would re-
sult in an erroneous output, m which case c is nD greater than the smalle
of a and b.
-21-
If the fraction of inputs excited is a for both inputs and the
average over all permutations is considered, then an output will be 0
only if both inputs are 1 to a particular stroke organ, and this would
occur on the average for a fraction a2 of the line. Therefore the frac-
tion of lines excited at the output would be
c = l-a2 (1.15)
The curve looks like this:
Fig. 1.26 C\
\
It does not restore, but rather reverses. To get restoring,
the process should be done twice. The effect of the iteration can be
found by substituting (1. 15) in itself as the argument, i.e.
, 2,2 ,2 4
a*=l-(l-a ) = 2a -a
(1. 16)
1. 27
To review the design procedure, we start with a single line
machine designed for error free stroke elements. Each line is replaced
by a bundle. Each organ is replaced by a bundle organ, followed by a
pair of cascaded Scheffer stroke restoring organs with "random permu-
tation" black boxes.
-22-
Until now we have not considered errors in the basic organs.
Furthermore we have not considered the effect of dispersion in the num-
ber of lines excited in a bundle. We have shown only that the average
number of lines excited in a bundle can be kept under control. The prob-
ability that the deviation from this average value will cause failure must
be considered. The number of lines excited has a distribution similar to
a binomial distribution, and in our case as with the binomial distribution,
the dispersion can be made very small by making the number of lines per
bundle very large.
The rest of the analysis involves a considerable amount of
algebraic manipulation, and it will only be outlined here.
The worst case will obviously occur when the fraction of errors
on each input line is a maximum, and therefore the calculation is made for
that case. The probability distribution is calculated for each of the com-
binations of input signals, and from the probability distributions, the prob-
ability that the fraction of errors will exceed the "fiduciary level" can
be found. This will be called the probability of error for this part of the
machine. This is really a conservative estimate of probability of error,
since the machine might, and probably would, function perfectly well and
have the fraction of errors in the outputs less than the fiduciary level even
though the fiduciary level might be exceeded at certain points within the ma-
chine.
Let 'X and jJ be the fractions of inputs carrying l's on the two
inputs of a stroke function for bundles. These bundles are assumed to come
from different randomizing boxes and restoring systems and hence the ar-
rangement of the errors in the two bundles can be considered random and in-
dependent. The probability distribution for the number of individual stroke
elements "excited" by two I s can be calculated. It turns out to be approx-
imately normal for large bundles, with
mean =
variance
(1. 17)
where IV is the number of lines in a bundle. This mean is consistent with
the average fraction of lines excited at the output calculated before, as given
in equation (1. 15).
-23-
(1. 18)
Assuming that the individual stroke elements have probab-
ility of an error, the probability distribution of errors in the out-
put bundle is still approximately normal for large bundles, with
mean = Q\ k) — Z & ( fyj - fsj
variance = [(l^)^ (|- fyfa-jj) +■ fCl-OjV
Now in order to continue the calculation, a fiduciary level
must be chosen. It is clear that it should not be either too near zero
or too near 1/2, and therefore there must be some optimum choice of
£ . VonNeuman chose S =°- 07 » but he does not explain how he ar-
rived at that value. For this value of £ , £ must be no larger than
. 0107. If > . 0107, it is not possible to make the probability of
error approach zero by making N approach infinity. If 6 is near
.0107 the approach to zero as a function of A/ would be slow. Von
Neuman chose to make calculations for 6 =• 005.
Substituting these values in theprobability distribution and
integrating over the tail gives the following approximate expression for
probability of error.
P = 6. 4 -8. 6N
j 10 10, 000 (1. 19)
It is convenient to define a quantity E called reliability, as
E = -logPE. (1. 20)
For this situation,
N
F. ^ 8.6N - log 6.4V i/21og N = ioQQ (1.21)
10000
if the base 10 is chosen.
For a human brain, an overestimate of the mean free
number of single operations between errors is
13
(10 components) • (10 operations per second)
• (3600- 24" 365 seconds per year)- (100 years)
23
= 10
-24-
This would require E to be 23, or N to be roughly 23, 000. Now
correction is necessary; since we estimated that the "brain" had 10
elements and since it is already multiplexed according to our assump-
tion, the error free design must have had roughly 10* 23,000 = 5x10'
Using this figure m place of 10*3, we find n to be roughly 18, 500, a
modest change in N .
Each prototype element requires 3N elements, N for
calculating, and 2N for restoring. Thus the number of elements in the
machine is multiplied by something like 55, 000.
Cutting these figures to a minimum, perhaps the number of
elements in the brain should be taken as 10^ and the time to a matter of
days instead of 100 years gives an estimate of the number of operations
between errors of roughly 1014 instead of 102^. This doesn't even cut N
to one half its previous value.
Of course the forgoing statements aren't meant to imply that
the brain is organized along these lines. In fact it almost certainly is not.
For small values of N this multiplexing makes the error probability great-
er, and therefore gradual evolution of a system like this would be unlikely
to occur.
As another example, consider a computing machine of say
1000 elements with 10^ operations per second and perhaps a requirement
of 3 hours mean free time between errors. The mean number of opera-
tions between errors would be roughly
1000xl05x(3600x3) = 10l2»
so that E should equal about 12. This would require an N of about 12000,
or 36, 000 times as many elements as in the original design. Of course
this assumed =. 005, which is very poor compared to actual com-
puter elements.
The Portfolio Problem and How to Pay the Forecaster
These notes, taken by W. W. Peterson, cover several
lectures in the Seminar on Information Theory offered
by C.E. Shannon at M.I.T., Spring Term, 1956.
The Portfolio Problem
The following analysis, due to John Kelly, was inspired by
news reports of betting on whether or not the contestant on the TV program
"$64,000 Question" would win. It seems that one enterprising gambler on
the west coast, where the program broadcast is delayed three hours, was
receiving tips by telephone before the local telecast took place. The ques-
tion arose as to how well the gambler could do if the communication channel
over which he received the tips was noisy.
Consider first the case where there are two equally likely
events on which the gambler may bet with 1 -1 odds. Suppose the gambler
receives tips which he knows are correct. Then he can double his money
each time he bets. If he starts with VQ dollars, after n bets he will have
Vn=V02n dollars. This is equivalent to an interest rate of 100%. This sug-
gests the definition of effective interest rate r :
1 v
vo
Now consider the case in which the tips have only probability
p ^ 1/2 being correct. Probability theory states that the expected winnings
are greatest when the gambler always bets all his money on the event which
his tip indicates is most likely to occur. His probability of going broke after
n bets, however, is equal to (4— pt11, and this approaches asw as n approaches
infinity. Csa€
An alternative approach would be for the gambler to bet a frac-
tion e of his money on each bet. If he starts with VQ and wins on the first bet
he will have 2e VQ+ ( 1 -e) VQ=( 1 + e) VQ. If he loses he will have only (l-e)VQ. It
is clear that each successive win multiplies his holdings by 1 + e while each suc-
cessive loss multiplies his holdings by 1-e. After W wins and L, losses he will
have
W I
Vn=(l + e) (1-e) VQ
dollars. The effective interest rate is
rn= ™- log (l+e)+klog (1-e)
n 11
1. With interest rate i, after n periods,
Vn=V0(l+i)n.
Substituting this in (1) above gives
r = i_log2(l+i)n = log2(l + i).
Thus r is a simple monotone function of the interest rate in the ordinary sense,
and maximizing r is equivalent to maximizing i.
When n is large we expect the fraction of wins to be roughly p, i. e. asp,
while -t- «q = 1-p.
Thus
rn a G=plog(l + e)+qlog (1-e).
This statement can be made more precise by using the laws of large numbers.
According to the weak law of large numbers, given any two positive numbers
£ and £ a number N can be found such that if n ^ N, the probability is at
least 1- £ that | r-G|^£. According to the strong law of large numbers,
given any two positive numbers £ and S a- number N can bs found such that
the probability is at least l-£ that | r-G|<^ £ after N bets and will remain
so no matter how many more bets are made. An equivalent statement is that
with probability one,
lim rn = G
n — >00
No matter which way you look at it, as the number of bets be-
comes very large, the gambler becomes more and more certain that his effec-
tive interest rate will be very close to G.
G is a function of e which has a maximum for some value of^e .
It is easily shown that the maximum occurs when l + e=2p, and hence l-e=2q
This gives
Gmax= p log 2p+ qlog2q = l + plogp+ qlogq
= l-H(p)
So that G max is equal to the rate of transmission over the channel by which
the tips are received!
If one gambler bets always the optimum fraction of his holdings
while a second bets a non-optimum fraction of his money on each bet, the effec-
tive interest rate for the first approaches G max with probability one while that
for the second approaches some lower value. It follows that the probability ap-
proaches one as n approaches infinity that the first gambler will have more
money than the second. The same result holds if the second gambler does not
bet a constant fraction of his money on each bet as long as he deviates from the
optimum by at least some fixed amount or at least a fixed fraction of the bets.
1. In information theory the problem often occurs of maximizing an expression
of the form
SAi log xi
by optimum choice of the x^ subject to the constraint that their sum is constant.
The solution is that the x^ are proportional to the A^.
In other words if one gambler bets according to the above scheme and a second
according to any significantly different scheme, the probability approaches one
as n approaches infinity that the first gambler will have more money than the
second after n bets.
This is not to say that this method of betting is the only way a
"rational" man would behave. While very persuasive in a general way, there
are situations and systems of values or utilities which would lead toother methods
of play, thus if the (remote) possibility of the extreme winning of 2 VQ were
sufficiently important (e.g. the only possible way to save the gambler's life) he
would be well advised to bet maximum expectation (all on the most probable event).
Now consider the more general problem in which there are m
events (outcomes of a horse race, for example) with probabilities Pj, P?,-. . . P .
The gambler receives a tip, one of n messages, which may not be reliable, per-
haps because of noise in the communication channel. But the gambler is assumed
to know how reliable the tips are by knowing the probability if event i occurred
(or will occur) of tip j:
Pi( j) = probability of tip j if event i occurs.
In addition to this the odds are assumed known
C\.= dollars returned per dollar bet if i occurs. The odds will be
called fair if
Pi<*i = I-
and if the equality
holds, we shall say there is "no track take". (Note that "fair odds" implies
"no track take" since, by (7) 1/^ = P. and ^IL p =1- ) "No track take"
turns out to simplify the analysis grea\ly, since it permits covering bets with
no loss, and hence makes betting all of one's holdings on every bet no less
general than permitting holding back part of one's money. Note that if one bets
1 dollars on each event, he will have bet exactly one dollar and will have one
d^1 dollar returned regardless of the outcome.
As an example, in pari mutual betting, the track takes a cer-
tain percent of all money bet and divides the rest among the people who bet on
the winning horse. If the track takes t percent, and if n^ dollars are bet on the
ith event, the odds are
and
1 1 2ni
(9)
(Xi 1-t (10)
If there is no track take, t = 0, and
5: i_ = i.
The gambler's strategy can be described by giving the percent
of his holdings which he will bet on event i if he receives tip j. This will be
denoted by a (i/j).
First let us assume fair odds, which implies no track take. As
was stated above, this means there is no loss of generality in assuming that the
gambler bets all his holdings, since he can cover bets with no risk of loss. Then
each bet multiplies his holdings by a factor a(i/j)0<>i if event i occurs and he had
received tip j. Suppose W(i, j) denotes the number of times he received tip j and
event i occurred in a total of n bets.
Then
Vn=T/ t(i'j)Cll Vo (U)
L J
This gives an effective interest rate
r = -Si log [a(i/jW] (12)
n i, j n ^~ 1 —
which has as its limit with probability one,
G= Sp.pftog oQ (13)
i,j 1
The relationship between r and G is the same as in the simple case discussed
first.
With fair odds, PC^l/P^. and hence,
G= S PiPi (j) log S. PiP,(j) log P. (14)
i, J if 3 1 1
-5-
Summing on j first and noting thatJS. p.(j)=l, the last term
becomes
-^P.logP.= H (x) (15)
•* i 1
Because of "no track take" we can assume that the gambler will bet all his
money, i.e. we can assume the constraint a(i/j) = l, and we can maximize
separately the parts of the sum in (14) for each value of the index j. As
before, (equation (6)), the a(i/j) must be proportional to P-p^j). Since
~a (i/j)=l,
a(i/j) = P.p.(j)
^PiPi(j) (16)
= Piii 3) = q.(i)
Q(j> 3
where p (i, j) is the probability that i occurred and tip j was received, Q(j)
is the probability of tip j, and q' (i) is the probability that event i occurred
if tip j was received. Then
Gmax = ^^p(i, j) log qj(i) +H(x)
= H(x)-Hy (x) = R
where x represents the event and y the tip. But again this is just the rate
of transmission over the communication channel carrying the tip!
Now suppose that the odds are not necessarily fair, but that
there is still no track take. The only change is that we cannot assume that
0<^p. = l, and hence the last term is 2 PilogO( • instead of ^ pj log p.. Denoting
this by H ( ), G becomes
G = -Hy (x) + H (CK)
•Hy (x) + H (x ) + H(<* )- H (x)
R+R
o
where R is the rate of transmission of information and RQ - H(c/J-]
RQ is independent of the tips, and hence we can see ite significance by con-
sidering the case where the tips give no information. Then R is the max-
o
imum effective interest rate possible with no tips. RQ is greater than or
equal to zero, and it equals zero only when * = pn-, i.e. fair odds. R
represents the maximum effective interest rate achievable by taking ad-
vantage of the fact that the odds are not fair.
It is interesting to note that it is best to bet an amount of
money a(i/j) proportional to q.(i) regardless of the odds. One would think
that to take best advantage of unfair odds the bets should be adjusted differently
for different odds, but this is not the case, at least for this type of betting.
siderably more difficult mathematically, so the results will only be outlined
here. In general the gambler should hold back some money. Arrange the
events in order of decreasing expectation (conditional on the available infor-
mation), i.e. in order of goodness of the bets. At some point a line is drawn
and bets placed only on the events above the line. Bets are made in proportion
to the conditional probability of their occurrence, holding back some of the
money. It turns out generally that some of tne events bet on, the ones just
above the line, have expectation less than one, i.e. q j (i) ^ 1 , even though
such bets would seem to be quite poor.
This case is con-
-7-
How to Pay the Forecaster
The following analysis was considered by I.J. Good in
England, and by Andy Gleason of Harvard University. The problem con-
cerns piecework payment to a consultant for predictions, the payment to
be made according to how good the prediction is.
Instead of the simple weather forecasts which are cus-
tomarily made, use a more sophisticated system in which probabilities
are given for each possible weather event. For example the weather man
might say, "The probability is one-half that it will snow, one-sixth that it
will rain, and one-third that it will be fair".
Now let us suppose that the client wishes to pay the forecaster
day-by-day, and by merit. Thus it would seem that a relatively high fee should
be paid if the forecaster assigns a high probability to the event which actually
occurs, and a low fee should be paid if the forecaster assigns a low probability.
But exactly what function of p?
Now let us consider the forecaster's viewpoint. Let us suppose
that he is more worried about how much money he will be paid than about good
forecasting. Let us assume that the function of p, f(p) which is his payment,
is know to him (as part of his contract) and let us assume he knows the probab-
ilities of the various events which he is attempting to forecast. Then he might
attempt to optimize mathematically his payoff by reporting a number a^ as the
probability of event i instead of its true probability p.. His expected payoff in
that case would be
which he would maximize subject to the constraint a^ = l , since the a^ must
look to the client like probabilities. Using the method of Lagrangian multipliers,
we find that the a^ satisfy the equation
;&i> + /\ = o
for each value of i. These equations together with the constraint equation enable
the forecaster to solve for the prediction a^ which will pay best.
Now, getting back to the client's viewpoint, he would like the
prediction which he receives to equal the actual probability, i.e. a^p.. This
will be the case if and only if
Pi'1 (Pi) + ^
for all p^, or in other words if
xf!(x) + ^=0
-8-
The solution of this differential equation is
f(p) = - ^ log p + C
and if this is to be a maximum, the second derivative should be negative,
or ^ should be negative.
f(p) = A log p+ B A > 0
Now consider what the average payment is:
Pave = A S Pi lo8 Pi + B
= B - A H (x)
The forecaster is paid a fixed salary from which is deducted an amount pro-
portional to the client's uncertainty about the predicted event after the pre-
diction I
NOTES OH RELATION OF ERROR PROBABILITY TO DELAY IN A HOIST CHANNEL
Lecture "by C. E. Shannon. August 30, 1956
The ordinary coding theorems assert 8 one thing about what can
be done in the limit of very long codes. They do not give information
as to how long the code must be to approach within a certain tolerance
of the limiting behavior. This question, the relation of probability
of error and length of code, is of considerable interest. Results here
bear about the same relation to earlier results as the central limit
theorem in probability bears to the law of large numbers. In fact, at a
key point in proving the theorems, the law of large numbers is used in
the first case and a generalization of the central limit theorem in the
second case.
The first type coding theorem relates to coding a source into
binary digits (say). If the source produces letters at a regular rate
and block coding is to be used a result may be obtained relating error
probability (this is here the probability of rare sequences for which no
binary sequences are available) and the rate at which binary digits are
available. It is convenient to use a measure reJJjaMlliy, E, rather than
probability of error directly,
E - J log P."1
n e
where n is the block length and P the probability of error with best
coding. As n increases, E approaches a limit in the case of sources
described by a Markoff process. For the simplest case, that in which
the language consists of a sequence of letters chosen independently from
a finite alphabet with probability pt for the 1th letter in the alphabet,
the limiting E can be given in parametric form (parameter s) as follows.
Let - pj~* / ^ pj"". Then If B(s) is the limiting
reliability and H(«) the rate of binary digits available for coding
(per letter of text), we hare
B(e) - Z. ^(s) log
*(•> - 1 q^.) log q^.)-1
A complete solution can also be given in the general Markoff case but
is more involved.
The second type of coding theorem relates to coding a sequence
(say) of binary digits into a noisy channel in such a way as to have
a small probability o^ error after decoding. The problem involving delay
in this ease is to determine for a block length of code n and an input
rate R the probability of error for the optimal coda. We limit ourselves
to discrete memoryless channels with finite alphabets. It is convenient
also to use a reliability measure I » - log P"1.
The problem is that of estimating X as a function of E, or, as it
turnsout, X and S as functions of s. Upper and lover bounds are found
on the probability of error for codes by a number of different arguments.
The most powerful argument far showing the existence of codec ie by the
random coding procedure. Bandoa codes are improved when the rate R is
small by an expurgating procedure. This is the elimination of code
words which are particularly close together. To establish lower bounds
on the probability of error, the most powerful argument is by the
sjhajfl ™»ririTu> method. This is the generalized analog of arguments to
the effect that one cannot get more than J spheres of volume v in a room
of volume T. The expurgated random code and the sphere packing argument
determine the asymptatic E exactly for rates H between a certain critical
value and the channel capacity. In fact, as one approaches channel
capacity the optimal probability of error for a given delay is or re and
more nearly determined. For rates below the critical rate, the bounds
diverge. Another type of lover bound on probability of error, suggested
by Ellas for the binary symmetric channel, becomes more powerful in
evaluating E. This is a bound based on the minimum separation between
words in a code. It turns out that for rates near zero the probability
of error is controlled chiefly by code words which are "close together".
In mott coannuni cation studies the analysis stops whan the message
ia recaired. Ho action baaed on tha message ia contemplated, John Sally
has considered a problem in which action ia taken baaed on the recaired
message , namely, tha messages are assumed to be tips on the outcome of
event a and a gambler my place bete on these event a. The problem la to
determine the gambler' a optimal system of betting and the value of the
channel to him. It ia assumed that the channel keeps operating and that
the gambler can reinvest his winnings. If after n plays of this game
the gambler haa \ dollars, we define his affective Interest rate aa
H " n log2 Vn ^o* We aMUne M eTent8 entries in a horse race)
with probabilities of occurrence PrP2 P„. The gambler receivee a
tip, one of n messages, which may not be reliable, but the gambler knows
the probability p4(j) of tip j if event i occurs. The available odda
for betting are a± dollars paid per dollar bet if i occurs. Odds are
called fair if P^ ■ 1. We aay there ia no. track take if a" " 1#
Am i
In the case of no track take, it ia poesible to effectively hold back a
dollar by betting *" dollars on event i for each i, since then one dollar
ai
is bet and one dollar always returned. Thus without loss of generality
all the capital can be bet each time. I
Assuming fair odds, (this implies no track take) it turns out that
the expected interest rate is maximized if the gambler beta money on
event i when tip J ia received in proportion to P^jU). when he beta
this way hia interest rate turns out to be
0 - H(x) - Hy(x) - a
1
- 2 -
That is, hit interest rats is the rats of transmission in communication
theory oror the channel carrying the tip. His interest rats is better
than that of any gambler who deflates significantly from this strategy
(with probability l), that is, any gamblsr who does not bet this way a
fraction of time > c > o.
If there is no track take but the odds are not necessarily fair,
it turns out that tho best interest rate becomes
G - R + R
where R is the rate of transmission for the channel and R is the
o
effective interest rats with no tips. It is the rate of interest
one can obtain from the fact that the probabilities are not equal to the
V
The situation is somewhat more complex when there is a track take.
betting odds
Reference; John Kelly: "A Sew Interpretation of Information Rate-.
Bell System Technical Journal, July. 1956.
The Fourth-Dimensional Twist
or
v.-
A Modest Proposal in Aid of the American Driver in England
Claude E. Shannon
An American driving in England is confronted with a wild and
dangerous world. The cars have the driver on the right and he is
supposed to drive on the lef t side of the road. It is as though
English driving is a left-handed version of the right-handed American
sys tem.
I can personally attest to the seriousness of this problem.
Recently my wife and I, together with another couple on an extended visit
to England, decided to jointly rent a car. Usually when we drove the men
would sit in the front seat, the women in the back. With our long- ingrained
driving habits the world seemed totally mad. Cars, bicycles and pedestrians
would dart out from nowhere and we would always be looking in the wrong
direction. The car was usually filled with curses from the men and with
screams and hysterical laughter from the women as we careened from one
narrow escape to another. The passengers were given to sudden involuntary
motions - shielding the face or slamming on non-existent brakes. The turn
indicator and windshield wiper controls were also reversed from American
practice and we found ourselves signaling turns with the windshield wiper -
fast for a right turn, slow for a left. The whole driving situation was
not particularly improved by the narrowness of English streets and the high
speed of English drivers. Nor was our inner security increased by the
predilection of the English for building stone walls immediately adjacent
to the roads.
This paper will develop a novel solution to this problem which
* This research was carried out in Trinity term, 1978 while the author was
a Visiting Fellow at Ail Souls College, Oxford.
incidentally can also be used for the Englishman driving in America.
In Fig. 1 we see two triangles. They are congruent but
one cannot be slid around in the plane to coincide with the other
since one is, so to speak, a left-handed version of the other. A
"f latlander" , limited to living in the plane, could scarcely
conceive how triangle A could be moved into coincidence with B, but
we, as three-dimensional be_ .gs, easily understand rotating the
triangle A about one of its sides and then sliding it into coin-
cidence with the other. 1
In an analogous way, in three dimensions we often have
right- and left-handed objects - a pair of gloves, for example, or
an American car compared to an English car of the same type. If we
had access to a fourth dimension, one could turn a left-handed glove
180° through the fourth dimension and it would reenter the third
2
dimension as a right-handed glove. This facility would be useful
in many ways. Both shoemakers and screwmakers would benefit. The
former would need only right-footed lasts, the latter only right-handed
taps and dies. Left-handed children could be flipped through the
fourth dimension to become right-handed, since the world of tools,
writing, etc., is for the most part more friendly to the right-handed.
Contrariwise, right-handed baseball pitchers might choose to become
southpaws. Our American driver coming to England might choose to
undergo this fourth-dimensional twist which would turn his perception
of England from left-handed to right-handed.
Alas, no one has found a method to rotate an object through
the fourth dimension, gov/evar, equally effective would be a rotation
for our American driver of all of England through the fourth dimension.
This concept no doubt sounds grandiose and utterly impractical - the
Fig. . 1
- 3 -
idle dream of a mathematician - "but we will show that it is not
only a theoretical possibility hut within the range of present-day
technology*
How will we do this? In a word, with mirrors. If you
hold your right hand in front of a mirror, the image appears as a
left hand. If you view it in a second mirror, after two reflections
it appears now as a right hand, and after three reflections again as
a left hand, and so on.
Our general plan is to encompass our American driver with
mirror systems which reflect his view of England an odd number of
times. Thus he sees the world about him not as it is but as it would
be after a l80° fourth-dimensional rotation.
To accomplish this we have two mirror systems. The side
mirror system is shown in Fig. 2, where we see the driver, from the
back, sitting in his English car. There are five mirrors in the car,
two on his right, tv/o on his left, and one above his head. These
serve to reflect images from the left over his head and down again so
they come in from the right. ^ Similarly, light rays from the right
are reflected over his head and down to come in from his left. Thus,
if he turns his head to the right side of the page, he will see, by a
triple reflection, an image of the object (an arrow) which is on the
left of the page. In the same manner, if he looks to the left of the
drawing, he will see what is on the right of the car.
To summarize, this group of five mirrors is so arranged that
when he looks to his right he will see what is on his left - when he
looks to his left he will see what is on his right.
Another set of mirrors provides for forward and backward
vision. These are shown in Fig. 3, where v/e see the driver from above.
- 4 -
For forward vision, three mirrors reflect the front visual field
4
about a vertical axis. Some light rays are indicated by letters
A and B to show how the interchange of right and left takes place.
The object (the usual arrow) appears to the driver as a reversed
image (again somewhat farther away because of the longer path).
A second set of three mirrors accommodates vision in the
backward direction. If our driver should turn his head around,
perhaps in driving in reverse or possibly to look at his passengers
in the back seat, he will again see a left-right reversed image.
These four mirror systems totally encompass our American
driver. Wherever he looks, he sees a reversed image of England -
always reflected three times. For him, England has been rotated
180° through the fourth dimension!
A further detail must be accounted for here. The rear-
vision mirror in an ordinary car corresponds to one reflection - in
looking through it we see words reversed and, in fact, catch a tiny
glimpse of the left-handed world we have been talking about. To keep
our system consistent, and to keep our American driver comfortable,
we have devised a rear-vision mirror using a double reflection, as
shown in Fig. 3. The driver looks up and to the right, as he would
in an American car.
, and sees out by a double reflection through the
rear window. This gives him the only glimpse he has of the real
"right-hand" world, since a double reflection preserves handedness.
In Fig. 5 v»8 see from above a car fitted with the fourth-
dimensional twister. The actual car as well as the actual English
road and countryside, are shown in heavy solid lines. In reality,
the car is parked on the left side of the road. Another car is
forward to the right and the road turns sharply to the right. The
driver's perception, however, because of his mirror system which
Fig. 5
- b -
reflects everything about the line XY, is that he is parked on the
right side of the road, that the other car is at his left , and that
the road turns sharply to the left. His perception of this situation
is shown in dotted lines. Kote that he even perceives his own car to
have changed to an American car, and his passenger, P, on the front
seat now appears to be on his right 1
Entering this car may he a bit of a shock, when the entire
world is reflected about a plane through the driver's seat, but after
a moment our American will feel comfortable and at home, with everything
as it "should be". He starts his engine and drives down the road. The
road actually turns sharply to the right. In his perception of course
it turns sharply to the left, so of course he turns to the left,
directly into the stone wall, and is instantly killed.
This, of course, is what would have happened had we not fore-
seen his natural reactions to a reversed perception of the world. One
must reverse not only the sensory iniput but also the motor output. Fig. 6
shows an attachment to the steering wheel which reverses its operation.
When turned to the right, the vehicle actually turns to the left and
vice versa. This operates much as differential gears in automobiles.
With this addition cur American driver will perceive a curve
to the left and, in natural response, turn to the left. In fact the
curve will be to the right and the mechanism will reverse his intent
and turn the car to the right.
This, then, is the basic idea of the fourth-dimensional twist.
There are, however, some loose ends to be dealt with. The perceptive
reader may wonder about roau signs. Our American driver, viewing
everything through a triple reflection, sees all of the road signs
Fig. 6
- 6 -
in reverse, as, for example, in Fig. 7. How is he to find his way
about? The answer is ridiculously simple. We have already pointed
out that his rear-vision mirror gives a double reflection and hence
a normal view of the real world. All he need do is hack his car up
to the road sign and read it through his rear-vision mirror!
A more troublesome problem is that of centrifugal force.
In the situation of Fig. 5, our driver is actually turning to the •
right but perceives himself to be turning to the left. Centrifugal
force Will opt for actuality. Our driver will, surprisingly, find
himself driven to the inside of the curve rather than the outside,
a most uncomfortable and confusing sensation.
To solve this problem, the reversal of centrifugal force,
might seem as impossible as the twist of England through the fourth
dimension. After all. centrifugal force is given by the formula
CO
f = m !Z_
B
A radius H of course is always positive, Cm as a square is necessarily
positive, and surely a mass in must be positive, so how can we arrange
for the centrifugal force f to be negative? Like Columbus and the egg,
the answer is very simple when given. If we immerse the mass in a
liquid of higher density, it acts as though it, itself, had a negative
mass. The liquid itself presses the object in the direction of
acceleration!
This concept is shown in Fig. 8. Our driver is now enclosed
in a scuba-diving suit within a compartment which is filled with a
liquid having a specific gravity of approximately 2. Of course he
v/ould tend to rise in this liquid but he is held down firmly by his
seatbelt. A snorkel provides for his breathing and altogether, with
our various devices, he feels very much as though he were at home in
America!
2QUT2 OHIJOIJK)
qOJJAW £3V0
LI
MOTTO 3 BKAMOHAH
Fig. 7
Fig. 8
FOOTNOTES:
1. Mathematically, such a rotation (about the y axis) can be
represented by the transformation
As J^goes from o to 180°, the point x,y,0 rotates from the
original plane about the y axis and tack into the plane,
becoming - x,y.
2. The mathematical analogue of the previous transformation
is that the point x , y , e , o fL^At h © going from 0
to 180 . The point x,y,z rotates about the y,z plane and
ends up back in the three-dimensional space as - x,y,s,o,
a mirror image with, the y, s plane as the mirror.
3. The image is shown here, for simplicity, at the same distance
from the driver as the object. Actually, it would appear
somewhat farther because of the "detour" around his head. This
difference would be only a foot or so, but should be kept in
mind in close driving situations.
4. In Fig. 4 we have shown another way of achieving the desired
reflection of the forward and backward fields using large
"roof" prisms in place of the triple mirrors. While more
costly, this method would considerably reduce the distance
distortion.
A Rubric on Rubik Cubics*
Claude E. Shannon
Once puzzledom was laissez faire
With rebus, crosswords, solitaire.
Comes now the Rubik Magic Cube
For Ph. D. or country rube.
This fiendish clever engineer
Entrapped the music of the sphere.
It's sphere on sphere in all 3 D -
A kinematic symphony!
Ta! Ra! Ra! Boom De Ay!
One thousand bucks a day.
That's Rubik's cubic pay.
He drives a Chevrolet/2*
Forty-three quintillion plus(3)
Problems Rubik posed for us.
Numbers of this awesome kind
Boggle even Sagan's mind.(4)
Some chaps pry their cubes apart
Then reassemble to the "start".
Not cricket! A rude game's afoot
And up with which we will not put!
Ta! Ra! Ra! Boom De Ay!
Cu-bies in disarray?
First twist them that-a-way,
Then turn them this-a-way.
Respect your cube and keep it clean.
Lube your cube with Vaseline.
Beware the dreaded cuber's thumb,
The callused hand and fingers numb.(5)
No borrower nor lender be.
Rude folk might switch two tabs on thee,
The most unkindest switch of all,
Into insolubility. (6)
In-sol-u-bility.
The crudest place to be.(7)
However you persist
Solutions don't exist.
While most folk watch the idiot tube
Cubemeisters spin the Rubik cube.
Minh Tai's the champ — he's fast as sin.
Minh solves his cube in half a min.(8)
-3-
John Conway leads a Cambridge pack
And solves his cube behind his back.(9)
Singmaster write THE BOOK — first rank;
Now cubes while riding to the Bank.(10)
Here now a heavyweight!
Programming potentate!
Software sophisticate!
Morwen B. Thistlethwaite!(11)
Eschewing this dull 3 D place
Joe Buhler cubes in hyperspace.(12)
All hail Dame Kathleen Ollerenshaw,
A mayor with fast cubic draw.(13)
Is cubing just a crashing bore?
Let Talken's robot do this chore.(14)
God moves in geodesic ways
And solves His cube in twenty plays.(15)
Cubemeisters one and all,
Their cubes find final rest
Bronzed in the Hall of Fame
In lovely Budapest.
The battle's joined in steely grip:
Man's mind against computer chip,
With theorems wrought by Conway's eight
'Gainst programs writ by Thistlethwaite.
Can multi-billion neuron brains
Beat multi-megabyte machines?
The thrust of this theistic schism —
To ferret out God's algorism!
CODA:
He (hooked on cubing) with great enthusiasm:
Ta! Ra! Ra! Boom De Ay!
Men's schemes gang aft agley.
Let's cube our life away!
She: Long pause (having been here before):
OY VAY!
(2)
(1) When T. S. Eliot published "The Waste Land" in 1922 with a wealth of footnotes, there was
considerable commotion among the critics — should a work of art stand on its own feet or refer to
such weighty tomes as The Golden Bough. The ambiguity, obscurity and even prurience of modern
poetry are also under attack. We intend this to be clean as a hound's tooth, crystal clear, sensible as
a dictionary, and with footnotes galore.
First off, this may be either read as a poem or, better, sung to "Ta! Ra! Ra! Boom De Ay!" (with
an eight bar chorus). The verses should be sung solo, in a slightly bitter sardonic manner, a la Noel
Coward or Bea Lillie; the choruses, in contrast, a joyous rousing salute to the cube.
A little poetic license here — the Wall Street Journal, Sept. 23, 1981, reports Rubik as receiving
$30,000 a month from cubic royalties, but driving a "run-down rattling Polski Fiat". This would
neither scan nor rhyme as well as Chevrolet.
(3) There are
■ 4- • — - 43252 00327 44898 56000
2 3 2
possible arrangements of the cube.
(4) It would take W/Zions and W//ions of "billions and billions" for forty-three quintillion plus.
(5) While not as debilitating as weaver's bottom or hooker's elbow, cuber's thumb can be both painful
and frustrating. For more on these occupational ailments see recent issues of "The New England
Journal of Medicine".
(6) A friend of mine, Pete, an expert cuber, told me of encountering a friend Bill at a hobby shop. Bill
gave Pete his cube, saying that he had been working for days without success. After a few minutes,
Pete turned it into a position where he could see that two tabs had been interchanged.
Pete: Bill, somebody has switched two tabs on your cube.
Bill: That's impossible. I've always carried it, or left it in my apartment, and nobody has keys
to get in there.
Pete: Nobody?
Bill: That's right, nobody. Just me and my girlfriend.
^ Especially in April.
(8) Minn Tai, World Speed Champion, in a public demonstration solved six scrambled cubes, each in
less than 30 seconds.
(9) Actually, he peeks a little. John Conway, the great Cambridge combinatorialist, in addition to his
tour de force blindfold cubing has, with his colleagues, contributed much to Rubik cube theory.
(10) Singmaster, David. Notes on Rubik' s Magic Cube, now in its sixth edition.
A pioneer in programming computers to solve the cube. His program solves the cube in 52 or fewer
moves.
(12) Group theorist Buhler and his colleagues have developed a theory of higher dimensional cubes.
^13^ Renaissance woman, sometime mayor of Manchester, recreational mathematician, expert cubist and
discoverer of the cubist thumb syndrome and its relation to the fetlock problem in horses.
(14) In October 1981 the writer foresaw the need for a cubing machine and sketched the design of a pair
of mechanical hands to be connected to a computer and manipulate a cube. In the summer of 1982
a crack team of one M.I.T. student was assembled. Late in July the hands were making their first
fumbling attempts to hold and manipulate a cube, when we received a crushing newspaper clipping
from a friend. It seems that Dan Talken had assembled a crack team of Southern Illinois University
students and beat us to the punch. My friend wrote one word across the slipping: "Scooped!"
(15) Or so Singmaster finds it tempting to conjecture.